C4.5: programs for machine learning
C4.5: programs for machine learning
Matrix analysis and applied linear algebra
Matrix analysis and applied linear algebra
Impact of imputation of missing values on classification error for discrete data
Pattern Recognition
A fast decision tree learning algorithm
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
Missing Data Imputation: A Fuzzy K-means Clustering Algorithm over Sliding Window
FSKD '09 Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 03
Missing Value Estimation for Mixed-Attribute Data Sets
IEEE Transactions on Knowledge and Data Engineering
A Novel Framework for Imputation of Missing Values in Databases
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Information Sciences: an International Journal
A decision tree-based missing value imputation technique for data pre-processing
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Knowledge discovery through SysFor: a systematically developed forest of multiple decision trees
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
VICUS: a noise addition technique for categorical data
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
We present two novel techniques for the imputation of both categorical and numerical missing values. The techniques use decision trees and forests to identify horizontal segments of a data set where the records belonging to a segment have higher similarity and attribute correlations. Using the similarity and correlations, missing values are then imputed. To achieve a higher quality of imputation some segments are merged together using a novel approach. We use nine publicly available data sets to experimentally compare our techniques with a few existing ones in terms of four commonly used evaluation criteria. The experimental results indicate a clear superiority of our techniques based on statistical analyses such as confidence interval.