DisClose: discovering colossal closed itemsets via a memory efficient compact row-tree

Authors:
Nurul F. Zulkurnain;David J. Haglin;John A. Keane
Affiliations:
Department of Electrical and Computer Engineering, Kuliyyah of Engineering, International Islamic University Malaysia, Kuala Lumpur, Malaysia,School of Computer Science, University of Manchester, ...;High Performance Computing, Pacific Northwest National Laboratory, Richland, WA;School of Computer Science, University of Manchester, Manchester, UK
Venue:
PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
Year:
2012

Citing 8
Cited 0

Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Using transposition for pattern discovery from microarray data

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Carpenter: finding closed patterns in long biological datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Frequent Closed Patterns in Microarray Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Constraint-based concept mining and its application to microarray data analysis

Intelligent Data Analysis
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
Top-down mining of frequent closed patterns from very high dimensional data

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A recent focus in itemset mining has been the discovery of frequent itemsets from high-dimensional datasets. With exponentially increasing running time as average row length increases, mining such datasets renders most conventional algorithms impractical. Unfortunately, large cardinality itemsets are likely to be more informative than small cardinality itemsets in this type of dataset. This paper proposes an approach, termed DisClose, to extract large cardinality (colossal) closed itemsets from high-dimensional datasets. The approach relies on a Compact Row-Tree data structure to represent itemsets during the search process. Large cardinality itemsets are enumerated first followed by smaller ones. In addition, we utilize a minimum cardinality threshold to further reduce the search space. Experimental results show that DisClose can achieve extraction of colossal closed itemsets in the discovered datasets, even for low support thresholds. The algorithm immediately discovers closed itemsets without needing to check if each new closed itemset has previously been found.