ESE begin 27 April 2026. View Timetable

CoreCuratedBDA

PCY Algorithm

Complete and consistent IA2 walkthrough for PCY frequent-pair mining.

PCY Algorithm (Park-Chen-Yu)

PCY is a memory-efficient algorithm for frequent pair mining in large transaction datasets.

Problem Setup

Minimum support (threshold) = 3
Pair hash function: $h(i,j)=(i\times j)\bmod 10$

Transactions:

\begin{aligned} T_1&=\{1,2,3\}\\ T_2&=\{2,3,4\}\\ T_3&=\{3,4,5\}\\ T_4&=\{4,5,6\}\\ T_5&=\{1,3,6\}\\ T_6&=\{2,4,6\}\\ T_7&=\{1,3,4\}\\ T_8&=\{2,4,5\}\\ T_9&=\{3,5,6\} \end{aligned}

Pass 1: Frequent Single Items

Item	1	2	3	4	5	6
Support	3	4	6	6	4	4

All items are frequent because support is at least 3.

Pass 1: Hash Buckets for Pairs

Every pair in every transaction is hashed into 10 buckets.

Bucket	Count
0	6
1	0
2	5
3	3
4	3
5	2
6	3
7	0
8	5
9	0

Frequent buckets (count at least 3):

\{0,2,3,4,6,8\}

Candidate Pairs for Pass 2

Candidate rule in PCY:

Both items must be frequent singletons.
Pair must hash to a frequent bucket.

Candidates that survive:

\{(2,5),(4,5),(5,6),(1,2),(2,6),(3,4),(1,3),(1,4),(4,6),(1,6),(2,3),(2,4),(3,6)\}

Pass 2: Exact Pair Supports

Pair	Support
(2,5)	1
(4,5)	3
(5,6)	2
(1,2)	1
(2,6)	1
(3,4)	3
(1,3)	3
(1,4)	1
(4,6)	2
(1,6)	1
(2,3)	2
(2,4)	3
(3,6)	2

Final frequent 2-itemsets (support at least 3):

\{(1,3),(2,4),(3,4),(4,5)\}

Why PCY is Useful

It reduces candidate explosion by filtering with hash-bucket bitmaps.
It saves memory compared to counting every possible pair directly.
It fits naturally with two-pass frequent itemset mining workflows.

On this page

PCY Algorithm (Park-Chen-Yu)Problem Setup Pass 1: Frequent Single Items Pass 1: Hash Buckets for Pairs Candidate Pairs for Pass 2 Pass 2: Exact Pair Supports Why PCY is Useful