Direct mining of discriminative and essential frequent patterns via model-based search tree

Authors:
Wei Fan;Kun Zhang;Hong Cheng;Jing Gao;Xifeng Yan;Jiawei Han;Philip Yu;Olivier Verscheure
Affiliations:
IBM T.J.Watson, Hawthorne, NY, USA;Xavier University of Louisiana, New Orleands, LA, USA;University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA;University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA;IBM T.J.Watson, Hawthorne, NY, USA;University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA;University of Illinois at Chi ago, Chicago, IL, USA;IBM T.J.Watson, Hawthorne, NY, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 18
Cited 29

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Cyclic pattern kernels for predictive graph mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining top-K covering rule groups for gene expression data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Frequent Substructure-Based Approaches for Classifying Chemical Compounds

IEEE Transactions on Knowledge and Data Engineering
Parallel mining of closed sequential patterns

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimal assignment kernels for attributed molecular graphs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Computational aspects of mining maximal frequent patterns

Theoretical Computer Science
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Direct Discriminative Pattern Mining for Effective Classification

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Association Analysis Techniques for Bioinformatics Problems

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Correlated itemset mining in ROC space: a constraint programming approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised relation extraction by mining Wikipedia texts using information from the web

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Mining structured data

IEEE Computational Intelligence Magazine
Direct mining of discriminative patterns for classifying uncertain data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A new algorithm for mining frequent connected subgraphs based on adjacency matrices

Intelligent Data Analysis
Full duplicate candidate pruning for frequent connected subgraph mining

Integrated Computer-Aided Engineering
Constructing classification features using minimal predictive patterns

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
DESSIN: mining dense subgraph patterns in a single graph

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
A concise representation of association rules using minimal predictive rules

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
NDPMine: efficiently mining discriminative numerical features for pattern-based classification

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Fast, effective molecular feature mining by local optimization

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
An approach for adaptive associative classification

Expert Systems with Applications: An International Journal
Itemset mining: A constraint programming perspective

Artificial Intelligence
Authorship classification: a discriminative syntactic tree mining approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Using constraints to generate and explore higher order discriminative patterns

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Direct local pattern sampling by efficient two-step random procedures

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Dual active feature and sample selection for graph classification

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Compositional object pattern: a new model for album event recognition

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Fast mining erasable itemsets using NC_sets

Expert Systems with Applications: An International Journal
LODE: A distance-based classifier built on ensembles of positive and negative observations

Pattern Recognition
Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing

Proceedings of the 15th International Conference on Extending Database Technology
I-prune: Item selection for associative classification

International Journal of Intelligent Systems
Semi-supervised feature selection using co-occurrent frequent subgraphs

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Efficient mining of top-k breaker emerging subgraph patterns from graph datasets

AusDM '09 Proceedings of the Eighth Australasian Data Mining Conference - Volume 101
A direct mining approach to efficient constrained graph pattern discovery

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Exploring discriminative pose sub-patterns for effective action classification

Proceedings of the 21st ACM international conference on Multimedia
A temporal pattern mining approach for classifying electronic health record data

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
MEI: An efficient algorithm for mining erasable itemsets

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent patterns provide solutions to datasets that do not have well-structured feature vectors. However, frequent pattern mining is non-trivial since the number of unique patterns is exponential but many are non-discriminative and correlated. Currently, frequent pattern mining is performed in two sequential steps: enumerating a set of frequent patterns, followed by feature selection. Although many methods have been proposed in the past few years on how to perform each separate step efficiently, there is still limited success in eventually finding highly compact and discriminative patterns. The culprit is due to the inherent nature of this widely adopted two-step approach. This paper discusses these problems and proposes a new and different method. It builds a decision tree that partitions the data onto different nodes. Then at each node, it directly discovers a discriminative pattern to further divide its examples into purer subsets. Since the number of examples towards leaf level is relatively small, the new approach is able to examine patterns with extremely low global support that could not be enumerated on the whole dataset by the two-step method. The discovered feature vectors are more accurate on some of the most difficult graph as well as frequent itemset problems than most recently proposed algorithms but the total size is typically 50% or more smaller. Importantly, the minimum support of some discriminative patterns can be extremely low (e.g. 0.03%). In order to enumerate these low support patterns, state-of-the-art frequent pattern algorithm either cannot finish due to huge memory consumption or have to enumerate 101 to 103 times more patterns before they can even be found. Software and datasets are available by contacting the author.