On Mining Instance-Centric Classification Rules

Authors:
Jianyong Wang;George Karypis
Affiliations:
IEEE Computer Society;IEEE Computer Society
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2006

Citing 23
Cited 10

C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining features for sequence classification

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A tree projection algorithm for generation of frequent item sets

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
On feature distributional clustering for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Using conjunction of attribute values for classification

Proceedings of the eleventh international conference on Information and knowledge management
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Clustering Association Rules

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Text Document Categorization by Term Association

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
DeEPs: A New Instance-Based Lazy Discovery and Classification System

Machine Learning
FARMER: finding interesting rule groups in microarray datasets

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient closed pattern mining in the presence of tough block constraints

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining top-K covering rule groups for gene expression data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data

Efficient itemset generator discovery over a stream sliding window

Proceedings of the 18th ACM conference on Information and knowledge management
Direct mining of discriminative patterns for classifying uncertain data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Input space reduction for rule based classification

WSEAS Transactions on Information Science and Applications
Efficient computation of measurements of correlated patterns in uncertain data

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Classification based on specific rules and inexact coverage

Expert Systems with Applications: An International Journal
Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing

Proceedings of the 15th International Conference on Extending Database Technology
I-prune: Item selection for associative classification

International Journal of Intelligent Systems
Editorial: Parameter-free classification in multi-class imbalanced data sets

Data & Knowledge Engineering
CAR-NF: A classifier based on specific rules with high netconf

Intelligent Data Analysis
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many studies have shown that rule-based classifiers perform well in classifying categorical and sparse high-dimensional databases. However, a fundamental limitation with many rule-based classifiers is that they find the rules by employing various heuristic methods to prune the search space and select the rules based on the sequential database covering paradigm. As a result, the final set of rules that they use may not be the globally best rules for some instances in the training database. To make matters worse, these algorithms fail to fully exploit some more effective search space pruning methods in order to scale to large databases. In this paper, we present a new classifier, HARMONY, which directly mines the final set of classification rules. HARMONY uses an instance-centric rule-generation approach and it can assure that, for each training instance, one of the highest-confidence rules covering this instance is included in the final rule set, which helps in improving the overall accuracy of the classifier. By introducing several novel search strategies and pruning methods into the rule discovery process, HARMONY also has high efficiency and good scalability. Our thorough performance study with some large text and categorical databases has shown that HARMONY outperforms many well-known classifiers in terms of both accuracy and computational efficiency and scales well with regard to the database size.