Distance and Similarity Measures
IA2-ready notes on Euclidean, Manhattan, Cosine, and Jaccard measures with worked examples.
Distance measures quantify dissimilarity; similarity measures quantify closeness.
For points (x1,y1) and (x2,y2):
dE=(x2−x1)2+(y2−y1)2
For n dimensions:
dE(x,y)=i=1∑n(xi−yi)2
Example for points (1,2) and (5,6):
dE=(5−1)2+(6−2)2=32=42
For points (x1,y1) and (x2,y2):
dM=∣x2−x1∣+∣y2−y1∣
Example from (1,2) to (3,5):
dM=∣3−1∣+∣5−2∣=2+3=5
| Point | Attribute 1 | Attribute 2 |
|---|
| X1 | 1 | 2 |
| X2 | 3 | 5 |
| X3 | 2 | 0 |
| X4 | 4 | 5 |
| X1 | X2 | X3 | X4 |
|---|
| X1 | 0 | 5 | 3 | 6 |
| X2 | 5 | 0 | 6 | 1 |
| X3 | 3 | 6 | 0 | 7 |
| X4 | 6 | 1 | 7 | 0 |
| X1 | X2 | X3 | X4 |
|---|
| X1 | 0 | 3.61 | 2.24 | 4.24 |
| X2 | 3.61 | 0 | 5.10 | 1.00 |
| X3 | 2.24 | 5.10 | 0 | 5.39 |
| X4 | 4.24 | 1.00 | 5.39 | 0 |
Cosine similarity compares direction, not magnitude:
cos(θ)=∥A∥∥B∥A⋅B
Cosine distance:
dcos=1−cos(θ)
Special cases:
- cos(0∘)=1 means maximum similarity.
- cos(90∘)=0 means orthogonal (no directional similarity).
Example with A=[1,2,0] and B=[2,1,1]:
A⋅B=4,∥A∥=5,∥B∥=6
cos(θ)=304≈0.73,dcos≈0.27
For sets A and B:
J(A,B)=∣A∪B∣∣A∩B∣
dJ(A,B)=1−J(A,B)
Given:
A={1,2,4,5},B={2,3,5,7}
A∩B={2,5},A∪B={1,2,3,4,5,7}
J(A,B)=62=0.333…,dJ(A,B)=0.666…
- Euclidean: dense numeric features where straight-line geometry is meaningful.
- Manhattan: grid-like movement or absolute coordinate differences.
- Cosine: text vectors and high-dimensional sparse vectors.
- Jaccard: set or binary-presence features.
These metrics are commonly used in KNN, clustering, recommendation, and retrieval.