Top-down mining of frequent closed patterns from very high dimensional data

Authors:
Hongyan Liu;Xiaoyu Wang;Jun He;Jiawei Han;Dong Xin;Zheng Shao
Affiliations:
Department of Management Science and Engineering, Tsinghua University, Beijing 100084, China;Department of Management Science and Engineering, Tsinghua University, Beijing 100084, China;Department of Computer Science, Renmin University of China, Beijing 100872, China;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Venue:
Information Sciences: an International Journal
Year:
2009

Citing 17
Cited 10

Mining frequent patterns by pattern-growth: methodology and implications

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Enhanced Biclustering on Expression Data

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
Using transposition for pattern discovery from microarray data

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Carpenter: finding closed patterns in long biological datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
COBBLER: Combining Column and Row Enumeration for Closed Pattern Discovery

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
FARMER: finding interesting rule groups in microarray datasets

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining top-K covering rule groups for gene expression data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Mining spatial association rules in image databases

Information Sciences: an International Journal
An efficient algorithm for mining frequent inter-transaction patterns

Information Sciences: an International Journal
Efficient mining of weighted interesting patterns with a strong weight and/or support affinity

Information Sciences: an International Journal
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
On the strength of hyperclique patterns for text categorization

Information Sciences: an International Journal
Discovery of maximum length frequent itemsets

Information Sciences: an International Journal
Incremental and interactive mining of web traversal patterns

Information Sciences: an International Journal

An algorithm to mine general association rules from tabular data

Information Sciences: an International Journal
Effective vaccination policies

Information Sciences: an International Journal
Toward boosting distributed association rule mining by data de-clustering

Information Sciences: an International Journal
Fast algorithm for computing fixpoints of Galois connections induced by object-attribute relational data

Information Sciences: an International Journal
Generalized association rule mining with constraints

Information Sciences: an International Journal
Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal
DisClose: discovering colossal closed itemsets via a memory efficient compact row-tree

PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
PMBC: Pattern mining from biological sequences with wildcard constraints

Computers in Biology and Medicine
Closed inter-sequence pattern mining

Journal of Systems and Software
A new method for mining disjunctive emerging patterns in high-dimensional datasets using hypergraphs

Information Systems

Quantified Score

Hi-index	0.07

Visualization

Abstract

Frequent pattern mining is an essential theme in data mining. Existing algorithms usually use a bottom-up search strategy. However, for very high dimensional data, this strategy cannot fully utilize the minimum support constraint to prune the rowset search space. In this paper, we propose a new method called top-down mining together with a novel row enumeration tree to make full use of the pruning power of the minimum support constraint. Furthermore, to efficiently check if a rowset is closed, we develop a method called the trace-based method. Based on these methods, an algorithm called TD-Close is designed for mining a complete set of frequent closed patterns. To enhance its performance further, we improve it by using new pruning strategies and new data structures that lead to a new algorithm TTD-Close. Our performance study shows that the top-down strategy is effective in cutting down search space and saving memory space, while the trace-based method facilitates the closeness-checking. As a result, the algorithm TTD-Close outperforms the bottom-up search algorithms such as Carpenter and FPclose in most cases. It also runs faster than TD-Close.