How do I calculate cosine similarity in Python between two number lists?
I want to calculate the cosine similarity between two lists, say, for example, list 1, which is dataSetI and list ,2 which is dataSetII.
For instance, dataSetI is [3, 45, 7, 2], and dataSetII is [2, 54, 13, 15]. The length of the lists is always equal. I want to compute the cosine similarity as a number between 0 and 1.
dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
def cosine_similarity(list1, list2):
How to calculate cosine similarity in Python?
print(cosine_similarity(dataSetI, dataSetII))
How can I implement this cosine_similarity python function to solve this?
You can calculate cosine similarity by leveraging numpy, which is optimized for vector operations.
import numpy as np
def cosine_similarity(list1, list2):
# Convert lists to numpy arrays
list1 = np.array(list1)
list2 = np.array(list2)
# Calculate the dot product and the magnitude of the vectors
dot_product = np.dot(list1, list2)
magnitude1 = np.linalg.norm(list1)
magnitude2 = np.linalg.norm(list2)
# Return cosine similarity
return dot_product / (magnitude1 * magnitude2)
dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
print(cosine_similarity(dataSetI, dataSetII))
Scikit-learn provides a cosine_similarity function that is simple and efficient for calculating cosine similarity between vectors.
from sklearn.metrics.pairwise import cosine_similarity
def cosine_similarity(list1, list2):
# Reshape lists to 2D arrays
return cosine_similarity([list1], [list2])[0][0]
dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
print(cosine_similarity(dataSetI, dataSetII))
You can calculate cosine similarity manually without relying on libraries by following the mathematical formula:
import math
def cosine_similarity(list1, list2):
# Compute dot product and magnitudes manually
dot_product = sum(x * y for x, y in zip(list1, list2))
magnitude1 = math.sqrt(sum(x ** 2 for x in list1))
magnitude2 = math.sqrt(sum(x ** 2 for x in list2))
# Return cosine similarity
return dot_product / (magnitude1 * magnitude2)
dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
print(cosine_similarity(dataSetI, dataSetII))
This solution will give you the cosine similarity between two number lists, using the cosine similarity Python approach in different ways.