Constructing classification features using minimal predictive patterns

Authors:
Iyad Batal;Milos Hauskrecht
Affiliations:
University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 21
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets

CL '00 Proceedings of the First International Conference on Computational Logic
Text classification using string kernels

The Journal of Machine Learning Research
Mining top-K covering rule groups for gene expression data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Frequent Substructure-Based Approaches for Classifying Chemical Compounds

IEEE Transactions on Knowledge and Data Engineering
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Interestingness measures for data mining: A survey

ACM Computing Surveys (CSUR)
Lazy Associative Classification

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Direct mining of discriminative and essential frequent patterns via model-based search tree

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum entropy based significance of itemsets

Knowledge and Information Systems
Direct Discriminative Pattern Mining for Effective Classification

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

A temporal pattern mining approach for classifying electronic health record data

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Choosing good features to represent objects can be crucial to the success of supervised machine learning methods. Recently, there has been a great interest in applying data mining techniques to construct new classification features. The rationale behind this approach is that patterns (feature-value combinations) could capture more underlying semantics than single features. Hence the inclusion of some patterns can improve the classification performance. Currently, most methods adopt a two-phases approach by generating all frequent patterns in the first phase and selecting the discriminative patterns in the second phase. However, this approach has limited success because it is usually very difficult to correctly identify important predictive patterns in a large set of highly correlated frequent patterns. In this paper, we introduce the minimal predictive patterns framework to directly mine a compact set of highly predictive patterns. The idea is to integrate pattern mining and feature selection in order to filter out non-informative and redundant patterns while being generated. We propose some pruning techniques to speed up the mining process. Our extensive experimental evaluation on many datasets demonstrates the advantage of our method by outperforming many well known classifiers.