Investigating ID3-Induced rules from low-dimensional data cleaned by complete case analysis

Authors:
Jeanette Auer;Richard Hall
Affiliations:
Department of Computer Science and Computer Engineering, La Trobe University, Bundoora, Victoria, Australia;Department of Computer Science and Computer Engineering, La Trobe University, Bundoora, Victoria, Australia
Venue:
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Year:
2004

Citing 4
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
Statistical inference and data mining

Communications of the ACM
Data preparation for data mining

Data preparation for data mining
Induction of Decision Trees

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

While knowledge discovery in databases techniques require statistically complete data, real world data is incomplete, so it must be preprocessed using completeness approximation methods The success of these methods is impacted by whether redundancy in large amounts of data overcomes incompleteness mechanisms We investigate this impact by comparing rule sets induced from complete data with rule sets induced from incomplete data that is preprocessed using complete case analysis To control the incomplete data construction, we apply the well-defined incompleteness mechanisms missing-at-random and missing-completely-at-random to complete data Initial results indicate that a medium level of pattern redundancy fails to fully overcome incompleteness mechanisms, and that characterizing an appropriate redundancy threshold is non-trivial.