Cosine Similarity

KoshurAI
2 min readMar 12, 2023

--

Cosine similarity is a popular similarity measure used in machine learning, natural language processing, and information retrieval. It is a mathematical concept that measures the similarity between two non-zero vectors, which is widely used in text analysis and recommendation systems. In this article, we will discuss the math behind cosine similarity, its practical applications, and how to calculate it using Python.

Math Behind Cosine Similarity

The cosine similarity is a measure of similarity between two non-zero vectors of an inner product space, that measures the cosine of the angle between them. It calculates the similarity between two vectors by determining the cosine of the angle between them. If the angle between the two vectors is small, the cosine value will be large, indicating that the vectors are similar. Conversely, if the angle between the two vectors is large, the cosine value will be small, indicating that the vectors are dissimilar.

The formula for calculating the cosine similarity between two vectors A and B is:

cosine_similarity(A, B) = (A . B) / (||A|| * ||B||)

where A and B are two non-zero vectors and dot product is a scalar multiplication operation of two vectors, and ||A|| and ||B|| are the magnitudes of vectors A and B respectively. The magnitude of a vector is calculated by taking the square root of the sum of the squares of its components.

Example of Cosine Similarity Calculation

Suppose we have two vectors A and B:

A = [2, 1, 0, 1, 1] B = [1, 1, 1, 1, 0]

To calculate the cosine similarity between A and B, we first need to calculate the dot product of the two vectors. The dot product is calculated by multiplying the corresponding elements of the two vectors and then summing them up:

A . B = (2 x 1) + (1 x 1) + (0 x 1) + (1 x 1) + (1 x 0) = 4

Next, we need to calculate the magnitude of each vector. The magnitude is calculated by taking the square root of the sum of the squares of each element in the vector:

|A| = sqrt(2² + 1² + 0² + 1² + 1²) = sqrt(7)

|B| = sqrt(1² + 1² + 1² + 1² + 0²) = sqrt(4)

Finally, we can calculate the cosine similarity by dividing the dot product by the product of the magnitudes:

cosine_similarity(A, B) = A . B / (|A| * |B|) = 4 / (sqrt(7) * sqrt(4)) = 0.7559

Therefore, the cosine similarity between vectors A and B is 0.7559, which indicates that they are relatively similar.

Applications of Cosine Similarity

Cosine similarity has a wide range of applications in various fields, including:

  1. Natural Language Processing: Cosine similarity is often used to measure the similarity between two documents or text snippets. It is used in search engines to find relevant documents and in recommendation systems to suggest similar products or services.
  2. Image Processing: Cosine similarity is used to find similarities between two images by comparing their feature vectors.
  3. Music Recommendation: Cosine similarity is used to find similarities between two songs by comparing their audio features, such as tempo, beats, melody, and rhythm.

--

--

KoshurAI
KoshurAI

Written by KoshurAI

Passionate about Data Science? I offer personalized data science training and mentorship. Join my course today to unlock your true potential in Data Science.

No responses yet