Improving the design of induction methods by analyzing algorithm functionality and data-based concept complexity

Authors:
Larry Rendell;Harish Ragavan
Affiliations:
Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL;Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
IJCAI'93 Proceedings of the 13th international joint conference on Artifical intelligence - Volume 2
Year:
1993

Citing 10
Cited 2

Pattern recognition: human and mechanical

Pattern recognition: human and mechanical
A general lower bound on the number of examples needed for learning

COLT '88 Proceedings of the first annual workshop on Computational learning theory
Empirical Learning as a Function of Concept Character

Machine Learning
Incremental induction of topologically minimal trees

Proceedings of the seventh international conference (1990) on Machine learning
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Learning hard concepts through constructive induction: framework and rationale

Computational Intelligence
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Learning Decision Lists

Machine Learning
The CN2 Induction Algorithm

Machine Learning
A General Framework for Induction and a Study of Selective Induction

Machine Learning

Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
Generation of attributes for learning algorithms

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although empirical machine learning has seen many algorithms, one of its most important goals has been neglected. Important real-world problems often have just a primitive representation, to which the target concept bears only a remote, obscure relationship. This consideration leads to a class of measures that may be applied to data to estimate difficulty for standard algorithms. As the concept becomes harder, current decision tree and decision list methods give increasingly poor accuracy, though backpropagation does better. A new system for feature construction scales up best. The fundamental limitation of standard algorithms is caused by two problems: greedy search and representational inadequacy. Critical analysis and empirical results show that lookahead alleviates the greedy hill-climbing problem at high cost, but even this is insufficient. Combining lookahead with feature construction alleviates the "complex global replication" problem with hard concepts. For principled algorithm development and good progress, researchers need to study hard concepts and system behavior using them.