Concept learning and the problem of small disjuncts

Authors:
Robert C. Holte;Liane E. Acker;Bruce W. Porter
Affiliations:
Computer Science Department, University of Ottawa, Ottawa, Canada;Department of Computer Sciences, University of Texas at Austin, Austin, Texas;Department of Computer Sciences, University of Texas at Austin, Austin, Texas
Venue:
IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 1
Year:
1989

Citing 3
Cited 32

Structured induction in expert systems

Structured induction in expert systems
Varying the Degree of Generalization in Concept Learning: An Empirical Study

Varying the Degree of Generalization in Concept Learning: An Empirical Study
Concept Learning and Heuristic Classification in Weak-Theory Domains

Concept Learning and Heuristic Classification in Weak-Theory Domains

Improving learning by choosing examples intelligently in two natural language tasks

Learning language in logic
Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
A Genetic Algorithm-Based Solution for the Problem of Small Disjuncts

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
From Computational Intelligence to Web Intelligence: An Ensemble from Potpourri

WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
Rule Quality Measures Improve the Accuracy of Rule Induction: An Experimental Approach

ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
Concept-Learning in the Presence of Between-Class and Within-Class Imbalances

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Extracting Information from the Web for Concept Learning and Collaborative Filtering

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Learning with Globally Predictive Tests

DS '98 Proceedings of the First International Conference on Discovery Science
Relational Learning: Hard Problems and Phase Transitions

AI*IA '99 Proceedings of the 6th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Knowledge discovery by means of inductive methods in wastewater treatment plant data

AI Communications
Search-intensive concept induction

Evolutionary Computation
Multi-label Lazy Associative Classification

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Rule Learning with Probabilistic Smoothing

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Don't care values in induction

Artificial Intelligence in Medicine
Pareto-optimal patterns in logical analysis of data

Discrete Applied Mathematics
Autocorrelation and linkage cause bias in evaluation of relational learners

ILP'02 Proceedings of the 12th international conference on Inductive logic programming
COGIN: symbolic induction with genetic algorithms

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Empirical analysis of the general utility problem in machine learning

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Lazy decision trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Cooperative evolutive concept learning: an empirical study

EC'05 Proceedings of the 6th WSEAS international conference on Evolutionary computing
Improving k nearest neighbor with exemplar generalization for imbalanced classification

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Variable randomness in decision tree ensembles

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
IC2: an interval based characteristic concept learner

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
A Kolmogorov-Smirnov statistic based segmentation approach to learning from imbalanced datasets: With application in property refinance prediction

Expert Systems with Applications: An International Journal
From local to global patterns: evaluation issues in rule learning algorithms

LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
A novel synthetic minority oversampling technique for imbalanced data set learning

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Rough Sets for Handling Imbalanced Data: Combining Filtering and Rule-based Classifiers

Fundamenta Informaticae - SPECIAL ISSUE ON CONCURRENCY SPECIFICATION AND PROGRAMMING (CS&P 2005) Ruciane-Nide, Poland, 28-30 September 2005
BRACID: a comprehensive approach to learning rules from imbalanced data

Journal of Intelligent Information Systems
Class imbalance and the curse of minority hubs

Knowledge-Based Systems
MEFES: An evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology

Knowledge-Based Systems
Software quality assessment using a multi-strategy classifier

Information Sciences: an International Journal
Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ideally, definitions induced from examples should consist of all, and only, disjuncts that are meaningful (e.g., as measured by a statistical significance test) and have a low error rate. Existing inductive systems create definitions that are ideal with regard to large disjuncts, but far from ideal with regard to small disjuncts, where a small (large) disjunct is one that correctly classifies few (many) training examples. The problem with small disjuncts is that many of them have high rates of misclassification, and it is difficult to eliminate the errorprone small disjuncts from a definition without adversely affecting other disjuncts in the definition. Various approaches to this problem are evaluated, including the novel approach of using a bias different than the "maximum generality" bias. This approach, and some others, prove partly successful, but the problem of small disjuncts remains open.