A new concise representation of frequent itemsets using generators and a positive border

Authors:
Guimei Liu;Jinyan Li;Limsoon Wong
Affiliations:
National University of Singapore, School of Computing, COM1, Law Link, 117590, Singapore, Singapore;Nanyang Technological University, School of Computer Engineering, COM1, Law Link, 117590, Singapore, Singapore;National University of Singapore, School of Computing, COM1, Law Link, 117590, Singapore, Singapore
Venue:
Knowledge and Information Systems
Year:
2008

Citing 20
Cited 5

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A condensed representation to find frequent patterns

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns with counting inference

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Data Mining and Knowledge Discovery
Concise Representation of Frequent Patterns Based on Disjunction-Free Generators

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Concise Representation of Frequent Patterns Based on Generalized Disjunction-Free Generators

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
On Computing Condensed Frequent Pattern Bases

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Carpenter: finding closed patterns in long biological datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
TSP: Mining top-k closed sequential patterns

Knowledge and Information Systems
Mining condensed frequent-pattern bases

Knowledge and Information Systems
On condensed representations of constrained frequent patterns

Knowledge and Information Systems
Catch the moment: maintaining closed frequent itemsets over a data stream sliding window

Knowledge and Information Systems

Sweeping the disjunctive search space towards mining new exact concise representations of frequent itemsets

Data & Knowledge Engineering
Frequent item set mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal
Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs

ACM Transactions on Knowledge Discovery from Data (TKDD)
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

A complete set of frequent itemsets can get undesirably large due to redundancy when the minimum support threshold is low or when the database is dense. Several concise representations have been previously proposed to eliminate the redundancy. Generator based representations rely on a negative border to make the representation lossless. However, the number of itemsets on a negative border sometimes even exceeds the total number of frequent itemsets. In this paper, we propose to use a positive border together with frequent generators to form a lossless representation. A positive border is usually orders of magnitude smaller than its corresponding negative border. A set of frequent generators plus its positive border is always no larger than the corresponding complete set of frequent itemsets, thus it is a true concise representation. The generalized form of this representation is also proposed. We develop an efficient algorithm, called GrGrowth, to mine generators and positive borders as well as their generalizations. The GrGrowth algorithm uses the depth-first-search strategy to explore the search space, which is much more efficient than the breadth-first-search strategy adopted by most of the existing generator mining algorithms. Our experiment results show that the GrGrowth algorithm is significantly faster than level-wise algorithms for mining generator based representations, and is comparable to the state-of-the-art algorithms for mining frequent closed itemsets.