CoreCuratedBDA
PCY Algorithm
Complete and consistent IA2 walkthrough for PCY frequent-pair mining.
PCY Algorithm (Park-Chen-Yu)
PCY is a memory-efficient algorithm for frequent pair mining in large transaction datasets.
Problem Setup
- Minimum support (threshold) = 3
- Pair hash function:
Transactions:
Pass 1: Frequent Single Items
| Item | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Support | 3 | 4 | 6 | 6 | 4 | 4 |
All items are frequent because support is at least 3.
Pass 1: Hash Buckets for Pairs
Every pair in every transaction is hashed into 10 buckets.
| Bucket | Count |
|---|---|
| 0 | 6 |
| 1 | 0 |
| 2 | 5 |
| 3 | 3 |
| 4 | 3 |
| 5 | 2 |
| 6 | 3 |
| 7 | 0 |
| 8 | 5 |
| 9 | 0 |
Frequent buckets (count at least 3):
Candidate Pairs for Pass 2
Candidate rule in PCY:
- Both items must be frequent singletons.
- Pair must hash to a frequent bucket.
Candidates that survive:
Pass 2: Exact Pair Supports
| Pair | Support |
|---|---|
| (2,5) | 1 |
| (4,5) | 3 |
| (5,6) | 2 |
| (1,2) | 1 |
| (2,6) | 1 |
| (3,4) | 3 |
| (1,3) | 3 |
| (1,4) | 1 |
| (4,6) | 2 |
| (1,6) | 1 |
| (2,3) | 2 |
| (2,4) | 3 |
| (3,6) | 2 |
Final frequent 2-itemsets (support at least 3):
Why PCY is Useful
- It reduces candidate explosion by filtering with hash-bucket bitmaps.
- It saves memory compared to counting every possible pair directly.
- It fits naturally with two-pass frequent itemset mining workflows.