ESE begin 27 April 2026. View Timetable
Logo
CoreCuratedBDA

Bloom Filter and Hashing

IA2 notes on Bloom filters, false positives, hash functions, and key modulo operations.

Bloom Filter and Hashing

Bloom filter is a probabilistic data structure used for membership testing.

  • If the filter says "not present", the element is definitely not present.
  • If the filter says "present", the element may or may not actually be present.

So Bloom filters have false positives, but no false negatives.

1) IA2 Worked Example

Given:

m=5,h1(x)=xmod5,h2(x)=(2x+3)mod5m=5,\quad h_1(x)=x\bmod 5,\quad h_2(x)=(2x+3)\bmod 5

Start bit array:

[0,0,0,0,0][0,0,0,0,0]

Insert x=9x=9:

h1(9)=4,h2(9)=1h_1(9)=4,\quad h_2(9)=1

Set positions 4 and 1 to 1:

[0,1,0,0,1][0,1,0,0,1]

Insert x=11x=11:

h1(11)=1,h2(11)=0h_1(11)=1,\quad h_2(11)=0

Set positions 1 and 0 to 1:

[1,1,0,0,1][1,1,0,0,1]

Query x=16x=16

h1(16)=1,h2(16)=0h_1(16)=1,\quad h_2(16)=0

Both positions are 1, so Bloom filter reports "possibly present".

If 16 was never inserted, this is a false positive.

2) False Positive Intuition

As more elements are inserted, more bits become 1, and false-positive probability increases.

Common approximation:

P(false positive)(1ekn/m)kP(\text{false positive})\approx\left(1-e^{-kn/m}\right)^k

Where:

  • mm = number of bits
  • kk = number of hash functions
  • nn = number of inserted elements

3) Hashing Notes

A hash function maps variable-length input to a fixed-size integer-like output (hash value).

MurmurHash is a popular non-cryptographic hash function used for speed and good distribution.

Example value from class notes:

hash("abd",123)=454173339hash("abd",123)=454173339

4) Useful Modulo Operations

10mod3=110\bmod 3=1 15mod3=015\bmod 3=0 128mod5=3128\bmod 5=3 31345mod419=33931345\bmod 419=339

5) Quick Revision

  • Bloom filter answers membership queries in O(k)O(k) time.
  • It is space-efficient for large set-membership problems.
  • It is widely used in databases, caching layers, and stream pipelines.

On this page