How do I calculate cosine similarity in Python between two number lists?

How do I calculate cosine similarity in Python between two number lists?

I want to calculate the cosine similarity between two lists, say, for example, list 1, which is dataSetI and list ,2 which is dataSetII.

For instance, dataSetI is [3, 45, 7, 2], and dataSetII is [2, 54, 13, 15]. The length of the lists is always equal. I want to compute the cosine similarity as a number between 0 and 1.

dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]

def cosine_similarity(list1, list2):

How to calculate cosine similarity in Python?

print(cosine_similarity(dataSetI, dataSetII))

How can I implement this cosine_similarity python function to solve this?

You can calculate cosine similarity by leveraging numpy, which is optimized for vector operations.

import numpy as np

def cosine_similarity(list1, list2):
    # Convert lists to numpy arrays
    list1 = np.array(list1)
    list2 = np.array(list2)
    
    # Calculate the dot product and the magnitude of the vectors
    dot_product = np.dot(list1, list2)
    magnitude1 = np.linalg.norm(list1)
    magnitude2 = np.linalg.norm(list2)
    
    # Return cosine similarity
    return dot_product / (magnitude1 * magnitude2)

dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
print(cosine_similarity(dataSetI, dataSetII))

Scikit-learn provides a cosine_similarity function that is simple and efficient for calculating cosine similarity between vectors.

from sklearn.metrics.pairwise import cosine_similarity

def cosine_similarity(list1, list2):
    # Reshape lists to 2D arrays
    return cosine_similarity([list1], [list2])[0][0]

dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
print(cosine_similarity(dataSetI, dataSetII))

You can calculate cosine similarity manually without relying on libraries by following the mathematical formula:

import math

def cosine_similarity(list1, list2):
    # Compute dot product and magnitudes manually
    dot_product = sum(x * y for x, y in zip(list1, list2))
    magnitude1 = math.sqrt(sum(x ** 2 for x in list1))
    magnitude2 = math.sqrt(sum(x ** 2 for x in list2))
    
    # Return cosine similarity
    return dot_product / (magnitude1 * magnitude2)

dataSetI = [3, 45, 7, 2]
dataSetII = [2, 54, 13, 15]
print(cosine_similarity(dataSetI, dataSetII))

This solution will give you the cosine similarity between two number lists, using the cosine similarity Python approach in different ways.