Categories
Python Answers

How to compute the similarity between two text documents with Python?

Spread the love

Sometimes, we want to compute the similarity between two text documents with Python.

In this article, we’ll look at how to compute the similarity between two text documents with Python.

How to compute the similarity between two text documents with Python?

To compute the similarity between two text documents with Python, we can use the scikit-learn library.

To install it, we run

pip install -U scikit-learn

Then we use by writing

from sklearn.feature_extraction.text import TfidfVectorizer

documents = [open(f).read() for f in text_files]
tfidf = TfidfVectorizer().fit_transform(documents)
pairwise_similarity = tfidf * tfidf.T

to open the files with the paths in the text_files list.

Then we create a TfidfVectorizer object and call fit_transforms with the strings returned by read.

And then we get their pairwise similarity with tfidf * tfidf.T.

Conclusion

To compute the similarity between two text documents with Python, we can use the scikit-learn library.

By John Au-Yeung

Web developer specializing in React, Vue, and front end development.

Leave a Reply

Your email address will not be published. Required fields are marked *