ESE begin 27 April 2026. View Timetable
Logo
CoreCuratedBDA

PCY Algorithm

Complete and consistent IA2 walkthrough for PCY frequent-pair mining.

PCY Algorithm (Park-Chen-Yu)

PCY is a memory-efficient algorithm for frequent pair mining in large transaction datasets.

Problem Setup

  • Minimum support (threshold) = 3
  • Pair hash function: h(i,j)=(i×j)mod10h(i,j)=(i\times j)\bmod 10

Transactions:

T1={1,2,3}T2={2,3,4}T3={3,4,5}T4={4,5,6}T5={1,3,6}T6={2,4,6}T7={1,3,4}T8={2,4,5}T9={3,5,6}\begin{aligned} T_1&=\{1,2,3\}\\ T_2&=\{2,3,4\}\\ T_3&=\{3,4,5\}\\ T_4&=\{4,5,6\}\\ T_5&=\{1,3,6\}\\ T_6&=\{2,4,6\}\\ T_7&=\{1,3,4\}\\ T_8&=\{2,4,5\}\\ T_9&=\{3,5,6\} \end{aligned}

Pass 1: Frequent Single Items

Item123456
Support346644

All items are frequent because support is at least 3.

Pass 1: Hash Buckets for Pairs

Every pair in every transaction is hashed into 10 buckets.

BucketCount
06
10
25
33
43
52
63
70
85
90

Frequent buckets (count at least 3):

{0,2,3,4,6,8}\{0,2,3,4,6,8\}

Candidate Pairs for Pass 2

Candidate rule in PCY:

  1. Both items must be frequent singletons.
  2. Pair must hash to a frequent bucket.

Candidates that survive:

{(2,5),(4,5),(5,6),(1,2),(2,6),(3,4),(1,3),(1,4),(4,6),(1,6),(2,3),(2,4),(3,6)}\{(2,5),(4,5),(5,6),(1,2),(2,6),(3,4),(1,3),(1,4),(4,6),(1,6),(2,3),(2,4),(3,6)\}

Pass 2: Exact Pair Supports

PairSupport
(2,5)1
(4,5)3
(5,6)2
(1,2)1
(2,6)1
(3,4)3
(1,3)3
(1,4)1
(4,6)2
(1,6)1
(2,3)2
(2,4)3
(3,6)2

Final frequent 2-itemsets (support at least 3):

{(1,3),(2,4),(3,4),(4,5)}\{(1,3),(2,4),(3,4),(4,5)\}

Why PCY is Useful

  • It reduces candidate explosion by filtering with hash-bucket bitmaps.
  • It saves memory compared to counting every possible pair directly.
  • It fits naturally with two-pass frequent itemset mining workflows.

On this page