Incorporating occupancy into frequent pattern mining for high quality pattern recommendation

Authors:
Linpeng Tang;Lei Zhang;Ping Luo;Min Wang
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;University of Science and Technology of China, Heifei, China;HP Labs China, Beijing, China;HP Labs China, Beijing, China
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 17
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Discovering Frequent Closed Itemsets for Association Rules

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Mining Frequent Item Sets with Convertible Constraints

Proceedings of the 17th International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
Efficient closed pattern mining in the presence of tough block constraints

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On Closed Constrained Frequent Pattern Mining

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

IEEE Transactions on Knowledge and Data Engineering
GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets

Data Mining and Knowledge Discovery
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
Closed patterns meet n-ary relations

ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining constraint-based patterns using automatic relaxation

Intelligent Data Analysis
Learning approximate MRFs from large transactional data

ICML'06 Proceedings of the 2006 conference on Statistical network analysis
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining interesting patterns from transaction databases has attracted a lot of research interest for more than a decade. Most of those studies use frequency, the number of times a pattern appears in a transaction database, as the key measure for pattern interestingness. In this paper, we introduce a new measure of pattern interestingness, occupancy. The measure of occupancy is motivated by some real-world pattern recommendation applications which require that any interesting pattern X should occupy a large portion of the transactions it appears in. Namely, for any supporting transaction t of pattern X, the number of items in X should be close to the total number of items in t. In these pattern recommendation applications, patterns with higher occupancy may lead to higher recall while patterns with higher frequency lead to higher precision. With the definition of occupancy we call a pattern dominant if its occupancy is above a user-specified threshold. Then, our task is to identify the qualified patterns which are both frequent and dominant. Additionally, we also formulate the problem of mining top-k qualified patterns: finding the qualified patterns with the top-k values of any function (e.g. weighted sum of both occupancy and support). The challenge to these tasks is that the monotone or anti-monotone property does not hold on occupancy. In other words, the value of occupancy does not increase or decrease monotonically when we add more items to a given itemset. Thus, we propose an algorithm called DOFIA (DOminant and Frequent Itemset mining Algorithm), which explores the upper bound properties on occupancy to reduce the search process. The tradeoff between bound tightness and computational complexity is also systematically addressed. Finally, we show the effectiveness of DOFIA in a real-world application on print-area recommendation for Web pages, and also demonstrate the efficiency of DOFIA on several large synthetic data sets.