COBBLER: Combining Column and Row Enumeration for Closed Pattern Discovery

  • Authors:
  • Feng Pan;Anthony K. H. Tung;Gao Cong;Xin Xu

  • Affiliations:
  • National University of Singapore;National University of Singapore;National University of Singapore;National University of Singapore

  • Venue:
  • SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of mining frequent closed patterns has receivedconsiderable attention recently as it promises to have much lessredundancy compared to discovering all frequent patterns. Existingalgorithms can presently be separated into two groups,feature (column) enumeration and row enumeration. Featureenumeration algorithms like CHARM and CLOSET+ areefficient for datasets with small number of features and largenumber of rows since the number of feature combinations to beenumerated is small. Row enumeration algorithms like CARPENTERon the other hand are more suitable for datasets (eg.bioinformatics data) with large number of features and smallnumber of rows. Both groups of algorithms, however, will encounterproblem for datasets that have large number of rowsand features.In this paper, we describe a new algorithm called COBBLERwhich can efficiently mine such datasets . COBBLER isdesigned to dynamically switch between feature enumerationand row enumeration depending on the data characteristic inthe process of mining. As such, each portion of the datasetcan be processed using the most suitable method, making themining more efficient. Several experiments on real-life andsynthetic datasets show that COBBLER is an order of magnitudebetter than previous closed pattern mining algorithmslike CHARM, CLOSET+ and CARPENTER.