NDPMine: efficiently mining discriminative numerical features for pattern-based classification

Authors:
Hyungsul Kim;Sangkyum Kim;Tim Weninger;Jiawei Han;Tarek Abdelzaher
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Year:
2010

Citing 14
Cited 7

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Complexity Measures of Supervised Classification Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Linear Programming Boosting via Column Generation

Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
DNA Sequence Classification Using DAWGs

Structures in Logic and Computer Science, A Selection of Essays in Honor of Andrzej Ehrenfeucht
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees

IEEE Transactions on Knowledge and Data Engineering
Direct mining of discriminative and essential frequent patterns via model-based search tree

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Partial least squares regression for graph mining

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
gBoost: a mathematical programming approach to graph classification and regression

Machine Learning
Direct Discriminative Pattern Mining for Effective Classification

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Classification of software behaviors for failure detection: a discriminative pattern mining approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Time series shapelets: a new primitive for data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Authorship classification: a discriminative syntactic tree mining approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A structure preserving flat data format representation for tree-structured data

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing

Proceedings of the 15th International Conference on Extending Database Technology
A framework for application of tree-structured data mining to process log analysis

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Exploring discriminative pose sub-patterns for effective action classification

Proceedings of the 21st ACM international conference on Multimedia
Application of tree-structured data mining for analysis of process logs in XML format

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Troubleshooting interactive complexity bugs in wireless sensor networks using data mining techniques

ACM Transactions on Sensor Networks (TOSN)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pattern-based classification has demonstrated its power in recent studies, but because the cost of mining discriminative patterns as features in classification is very expensive, several efficient algorithms have been proposed to rectify this problem. These algorithms assume that feature values of the mined patterns are binary, i.e., a pattern either exists or not. In some problems, however, the number of times a pattern appears is more informative than whether a pattern appears or not. To resolve these deficiencies, we propose a mathematical programming method that directly mines discriminative patterns as numerical features for classification. We also propose a novel search space shrinking technique which addresses the inefficiencies in iterative pattern mining algorithms. Finally, we show that our method is an order of magnitude faster, significantly more memory efficient and more accurate than current approaches.