ESE begin 27 April 2026. View Timetable
Logo
CoreCuratedBDA

Flajolet-Martin Algorithm

IA2-ready notes on distinct-count estimation in data streams using the Flajolet-Martin method.

Flajolet-Martin (FM) Algorithm

Flajolet-Martin is a probabilistic algorithm that estimates the number of distinct elements in a stream using very little memory.

Core Idea

If a hash value ends with many trailing zeros, that event is rare. Observing a maximum of RR trailing zeros suggests the distinct count is on the order of 2R2^R.

Basic Procedure

  1. Choose a hash function h(x)h(x) that spreads values uniformly.
  2. For each stream item xx, compute h(x)h(x).
  3. Let r(x)r(x) be the number of trailing zeros in the binary form of h(x)h(x).
  4. Track R=maxr(x)R=\max r(x) over the stream.
  5. Estimate distinct count as D^2R\hat{D}\approx 2^R.

Example 1

Stream:

x=1,3,2,1,2,3,4,3,1,2,3,1x = 1,3,2,1,2,3,4,3,1,2,3,1

Hash function:

h(x)=(6x+1)mod5h(x)=(6x+1)\bmod 5

Unique hash results encountered:

h(1)=2,h(2)=3,h(3)=4,h(4)=0h(1)=2,\quad h(2)=3,\quad h(3)=4,\quad h(4)=0

Binary forms (3-bit):

2=010,  3=011,  4=100,  0=0002=010,\; 3=011,\; 4=100,\; 0=000

Trailing zero counts:

0101,0110,1002010\to1,\quad 011\to0,\quad 100\to2

From the class-note convention, 000000 was treated as 0 trailing zeros.

So:

R=2D^=22=4R=2\quad\Rightarrow\quad \hat{D}=2^2=4

Estimated distinct elements: 4 (which matches {1,2,3,4}\{1,2,3,4\}).

Example 2

Given stream:

4,5,9,1,6,3,74,5,9,1,6,3,7

Hash function:

h(x)=(x+6)mod32h(x)=(x+6)\bmod 32

Computed values:

h(4)=10,  h(5)=11,  h(9)=15,  h(1)=7,  h(6)=12,  h(3)=9,  h(7)=13h(4)=10,\; h(5)=11,\; h(9)=15,\; h(1)=7,\; h(6)=12,\; h(3)=9,\; h(7)=13

5-bit binary:

01010,  01011,  01111,  00111,  01100,  01001,  0110101010,\;01011,\;01111,\;00111,\;01100,\;01001,\;01101

Maximum trailing zeros is for 0110001100:

R=2D^=22=4R=2\quad\Rightarrow\quad \hat{D}=2^2=4

Practical Note for Exams

Single-hash FM can be noisy. In practice, multiple hash functions (or multiple registers) are used and estimates are combined using mean/median-like aggregation to reduce variance.

Quick Revision

  • FM is one-pass and memory-efficient.
  • It estimates cardinality, not exact count.
  • Formula to remember: D^2R\hat{D}\approx 2^R, where RR is max trailing-zero count.

On this page