Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules
Advances in knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Discovering Frequent Closed Itemsets for Association Rules
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Mining Frequent Item Sets with Convertible Constraints
Proceedings of the 17th International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
State of the art of graph-based data mining
ACM SIGKDD Explorations Newsletter
Efficient closed pattern mining in the presence of tough block constraints
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On Closed Constrained Frequent Pattern Mining
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets
IEEE Transactions on Knowledge and Data Engineering
GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets
Data Mining and Knowledge Discovery
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
Closed patterns meet n-ary relations
ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining constraint-based patterns using automatic relaxation
Intelligent Data Analysis
Learning approximate MRFs from large transactional data
ICML'06 Proceedings of the 2006 conference on Statistical network analysis
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Mining interesting patterns from transaction databases has attracted a lot of research interest for more than a decade. Most of those studies use frequency, the number of times a pattern appears in a transaction database, as the key measure for pattern interestingness. In this paper, we introduce a new measure of pattern interestingness, occupancy. The measure of occupancy is motivated by some real-world pattern recommendation applications which require that any interesting pattern X should occupy a large portion of the transactions it appears in. Namely, for any supporting transaction t of pattern X, the number of items in X should be close to the total number of items in t. In these pattern recommendation applications, patterns with higher occupancy may lead to higher recall while patterns with higher frequency lead to higher precision. With the definition of occupancy we call a pattern dominant if its occupancy is above a user-specified threshold. Then, our task is to identify the qualified patterns which are both frequent and dominant. Additionally, we also formulate the problem of mining top-k qualified patterns: finding the qualified patterns with the top-k values of any function (e.g. weighted sum of both occupancy and support). The challenge to these tasks is that the monotone or anti-monotone property does not hold on occupancy. In other words, the value of occupancy does not increase or decrease monotonically when we add more items to a given itemset. Thus, we propose an algorithm called DOFIA (DOminant and Frequent Itemset mining Algorithm), which explores the upper bound properties on occupancy to reduce the search process. The tradeoff between bound tightness and computational complexity is also systematically addressed. Finally, we show the effectiveness of DOFIA in a real-world application on print-area recommendation for Web pages, and also demonstrate the efficiency of DOFIA on several large synthetic data sets.