Theoretical and Practical Considerations of Uncertainty and Complexity in Automated Knowledge Acquisition

Authors:
Xiao-Jia M. Zhou;Tharam S. Dillon
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
1995

Citing 11
Cited 1

A theory of the learnable

Communications of the ACM
Inductive knowledge acquisition: a case study

Proceedings of the Second Australian Conference on Applications of expert systems
Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
Empirical Learning as a Function of Concept Character

Machine Learning
A Distance-Based Attribute Selection Measure for Decision Tree Induction

Machine Learning
Letter Recognition Using Holland-Style Adaptive Classifiers

Machine Learning
Rule induction with CN2: some recent improvements

EWSL-91 Proceedings of the European working session on learning on Machine learning
A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Decision Trees and Diagrams

ACM Computing Surveys (CSUR)
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Induction of Decision Trees

Machine Learning

Correction to a Footnote in "Theoretical and Practical Considerations of Uncertainty and Complexity in Automated Knowledge Acquisition"

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inductive machine learning has become an important approach to automated knowledge acquisition from databases. The disjunctive normal form (DNF), as the common analytic representation of decision trees and decision tables (rules), provides a basis for formal analysis of uncertainty and complexity in inductive learning. In this paper, a theory for general decision trees is developed based on Shannon驴s expansion of the discrete DNF, and a probabilistic induction system PIK is further developed for extracting knowledge from real-world data. Then we combine formal and practical approaches to study how data characteristics affect the uncertainty and complexity in inductive learning. Three important data characteristics, namely, disjunctiveness, noise and incompleteness, are studied. The combination of leveled-pruning, leveled-condensing and resampling-estimation turns out to be a very powerful method for dealing with highly-disjunctive and inadequate data. Finally the PIK system is compared with other recent inductive learning systems on a number of real-world domains.