Flajolet-Martin Algorithm
IA2-ready notes on distinct-count estimation in data streams using the Flajolet-Martin method.
Flajolet-Martin (FM) Algorithm
Flajolet-Martin is a probabilistic algorithm that estimates the number of distinct elements in a stream using very little memory.
Core Idea
If a hash value ends with many trailing zeros, that event is rare. Observing a maximum of trailing zeros suggests the distinct count is on the order of .
Basic Procedure
- Choose a hash function that spreads values uniformly.
- For each stream item , compute .
- Let be the number of trailing zeros in the binary form of .
- Track over the stream.
- Estimate distinct count as .
Example 1
Stream:
Hash function:
Unique hash results encountered:
Binary forms (3-bit):
Trailing zero counts:
From the class-note convention, was treated as 0 trailing zeros.
So:
Estimated distinct elements: 4 (which matches ).
Example 2
Given stream:
Hash function:
Computed values:
5-bit binary:
Maximum trailing zeros is for :
Practical Note for Exams
Single-hash FM can be noisy. In practice, multiple hash functions (or multiple registers) are used and estimates are combined using mean/median-like aggregation to reduce variance.
Quick Revision
- FM is one-pass and memory-efficient.
- It estimates cardinality, not exact count.
- Formula to remember: , where is max trailing-zero count.