Bloom Filter and Hashing
IA2 notes on Bloom filters, false positives, hash functions, and key modulo operations.
Bloom Filter and Hashing
Bloom filter is a probabilistic data structure used for membership testing.
- If the filter says "not present", the element is definitely not present.
- If the filter says "present", the element may or may not actually be present.
So Bloom filters have false positives, but no false negatives.
1) IA2 Worked Example
Given:
Start bit array:
Insert :
Set positions 4 and 1 to 1:
Insert :
Set positions 1 and 0 to 1:
Query
Both positions are 1, so Bloom filter reports "possibly present".
If 16 was never inserted, this is a false positive.
2) False Positive Intuition
As more elements are inserted, more bits become 1, and false-positive probability increases.
Common approximation:
Where:
- = number of bits
- = number of hash functions
- = number of inserted elements
3) Hashing Notes
A hash function maps variable-length input to a fixed-size integer-like output (hash value).
MurmurHash is a popular non-cryptographic hash function used for speed and good distribution.
Example value from class notes:
4) Useful Modulo Operations
5) Quick Revision
- Bloom filter answers membership queries in time.
- It is space-efficient for large set-membership problems.
- It is widely used in databases, caching layers, and stream pipelines.