Home » How to Calculate Cosine Similarity in Python

How to Calculate Cosine Similarity in Python

by Erma Khan

Cosine Similarity is a measure of the similarity between two vectors of an inner product space.

For two vectors, A and B, the Cosine Similarity is calculated as:

Cosine Similarity = ΣAiBi / (√ΣAi2√ΣBi2)

This tutorial explains how to calculate the Cosine Similarity between vectors in Python using functions from the NumPy library.

Cosine Similarity Between Two Vectors in Python

The following code shows how to calculate the Cosine Similarity between two arrays in Python:

from numpy import dot
from numpy.linalg import norm

#define arrays
a = [23, 34, 44, 45, 42, 27, 33, 34]
b = [17, 18, 22, 26, 26, 29, 31, 30]

#calculate Cosine Similarity
cos_sim = dot(a, b)/(norm(a)*norm(b))

cos_sim

0.965195008357566

The Cosine Similarity between the two arrays turns out to be 0.965195.

Note that this method will work on two arrays of any length:

import numpy as np
from numpy import dot
from numpy.linalg import norm

#define arrays
a = np.random.randint(10, size=100)
b = np.random.randint(10, size=100)

#calculate Cosine Similarity
cos_sim = dot(a, b)/(norm(a)*norm(b))

cos_sim

0.7340201613960431

However, it only works if the two arrays are of equal length:

import numpy as np
from numpy import dot
from numpy.linalg import norm

#define arrays
a = np.random.randint(10, size=90) #length=90
b = np.random.randint(10, size=100) #length=100

#calculate Cosine Similarity
cos_sim = dot(a, b)/(norm(a)*norm(b))

cos_sim

ValueError: shapes (90,) and (100,) not aligned: 90 (dim 0) != 100 (dim 0)

Notes

1. There are multiple ways to calculate the Cosine Similarity using Python, but as this Stack Overflow thread explains, the method explained in this post turns out to be the fastest.

2. Refer to this Wikipedia page to learn more details about Cosine Similarity.

Related Posts