ESE begin 27 April 2026. View Timetable
Logo
CoreCuratedBDA

Distance and Similarity Measures

IA2-ready notes on Euclidean, Manhattan, Cosine, and Jaccard measures with worked examples.

Distance and Similarity Measures

Distance measures quantify dissimilarity; similarity measures quantify closeness.

1) Euclidean Distance (L2)

Euclidean geometry overview

For points (x1,y1)(x_1,y_1) and (x2,y2)(x_2,y_2):

dE=(x2x1)2+(y2y1)2d_E = \sqrt{(x_2-x_1)^2 + (y_2-y_1)^2}

For nn dimensions:

dE(x,y)=i=1n(xiyi)2d_E(\mathbf{x},\mathbf{y}) = \sqrt{\sum_{i=1}^{n}(x_i-y_i)^2}

Example for points (1,2)(1,2) and (5,6)(5,6):

Euclidean worked example dE=(51)2+(62)2=32=42d_E = \sqrt{(5-1)^2 + (6-2)^2} = \sqrt{32} = 4\sqrt{2}

2) Manhattan Distance (L1)

Manhattan path interpretation

For points (x1,y1)(x_1,y_1) and (x2,y2)(x_2,y_2):

dM=x2x1+y2y1d_M = |x_2-x_1| + |y_2-y_1|

Example from (1,2)(1,2) to (3,5)(3,5):

dM=31+52=2+3=5d_M = |3-1| + |5-2| = 2 + 3 = 5

3) Worked Dataset

Dataset points on coordinate grid
PointAttribute 1Attribute 2
X1X_112
X2X_235
X3X_320
X4X_445

Manhattan Distance Matrix

Manhattan distance matrix diagram
X1X_1X2X_2X3X_3X4X_4
X1X_10536
X2X_25061
X3X_33607
X4X_46170

Euclidean Distance Matrix

Euclidean distance matrix diagram
X1X_1X2X_2X3X_3X4X_4
X1X_103.612.244.24
X2X_23.6105.101.00
X3X_32.245.1005.39
X4X_44.241.005.390
Two-point segment visualization Delta x and delta y geometry Coordinate axes reference

4) Cosine Similarity and Cosine Distance

Cosine vector angle

Cosine similarity compares direction, not magnitude:

cos(θ)=ABAB\cos(\theta)=\frac{\mathbf{A}\cdot\mathbf{B}}{\|\mathbf{A}\|\|\mathbf{B}\|}

Cosine distance:

dcos=1cos(θ)d_{\cos}=1-\cos(\theta)

Special cases:

  • cos(0)=1\cos(0^\circ)=1 means maximum similarity.
  • cos(90)=0\cos(90^\circ)=0 means orthogonal (no directional similarity).

Example with A=[1,2,0]\mathbf{A}=[1,2,0] and B=[2,1,1]\mathbf{B}=[2,1,1]:

Genre axis example for cosine similarity AB=4,A=5,B=6\mathbf{A}\cdot\mathbf{B}=4, \quad \|\mathbf{A}\|=\sqrt{5}, \quad \|\mathbf{B}\|=\sqrt{6} cos(θ)=4300.73,dcos0.27\cos(\theta)=\frac{4}{\sqrt{30}}\approx 0.73, \quad d_{\cos}\approx 0.27

5) Jaccard Similarity and Distance

For sets AA and BB:

J(A,B)=ABABJ(A,B)=\frac{|A\cap B|}{|A\cup B|} dJ(A,B)=1J(A,B)d_J(A,B)=1-J(A,B)

Given:

A={1,2,4,5},B={2,3,5,7}A=\{1,2,4,5\},\quad B=\{2,3,5,7\} AB={2,5},AB={1,2,3,4,5,7}A\cap B=\{2,5\},\quad A\cup B=\{1,2,3,4,5,7\} J(A,B)=26=0.333,dJ(A,B)=0.666J(A,B)=\frac{2}{6}=0.333\ldots, \quad d_J(A,B)=0.666\ldots

6) Which Measure to Use

  • Euclidean: dense numeric features where straight-line geometry is meaningful.
  • Manhattan: grid-like movement or absolute coordinate differences.
  • Cosine: text vectors and high-dimensional sparse vectors.
  • Jaccard: set or binary-presence features.

These metrics are commonly used in KNN, clustering, recommendation, and retrieval.

On this page