Fast discovery of association rules
Advances in knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
KDD-Cup 2000 organizers' report: peeling the onion
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Using transposition for pattern discovery from microarray data
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Carpenter: finding closed patterns in long biological datasets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
COBBLER: Combining Column and Row Enumeration for Closed Pattern Discovery
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Mining Frequent Closed Patterns in Microarray Data
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Efficient colossal pattern mining in high dimensional datasets
Knowledge-Based Systems
Contrast mining from interesting subgroups
Bisociative Knowledge Discovery
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Most known frequent item set mining algorithms work by enumerating candidate item sets and pruning infrequent candidates. An alternative method, which works by intersecting transactions, is much less researched. To the best of our knowledge, there are only two basic algorithms: a cumulative scheme, which is based on a repository with which new transactions are intersected, and the Carpenter algorithm, which enumerates and intersects candidate transaction sets. These approaches yield the set of so-called closed frequent item sets, since any such item set can be represented as the intersection of some subset of the given transactions. In this paper we describe a considerably improved implementation scheme of the cumulative approach, which relies on a prefix tree representation of the already found intersections. In addition, we present an improved way of implementing the Carpenter algorithm. We demonstrate that on specific data sets, which occur particularly often in the area of gene expression analysis, our implementations significantly outperform enumeration approaches to frequent item set mining.