Statistical analysis with missing data
Statistical analysis with missing data
C4.5: programs for machine learning
C4.5: programs for machine learning
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
"Missing Is Useful': Missing Values in Cost-Sensitive Decision Trees
IEEE Transactions on Knowledge and Data Engineering
Ensemble of missing data techniques to improve software prediction accuracy
Proceedings of the 28th international conference on Software engineering
Computers in Biology and Medicine
Privacy-preserving imputation of missing data
Data & Knowledge Engineering
Missing Data Imputation Techniques
International Journal of Business Intelligence and Data Mining
An iterative refinement approach for data cleaning
Intelligent Data Analysis
Impact of imputation of missing values on classification error for discrete data
Pattern Recognition
POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases
Expert Systems with Applications: An International Journal
Aprimorando processos de imputação multivariada de dados com workflows
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES
Applied Artificial Intelligence
Imputation of missing sensor data values using in-exact replicas
International Journal of Intelligent Systems Technologies and Applications
Selection-fusion approach for classification of datasets with missing values
Pattern Recognition
GBKII: an imputation method for missing values
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Ensemble missing data techniques for software effort prediction
Intelligent Data Analysis
Artificial Intelligence in Medicine
The Effects and Interactions of Data Quality and Problem Complexity on Classification
Journal of Data and Information Quality (JDIQ)
Journal of Intelligent Information Systems
Missing data imputation by utilizing information within incomplete instances
Journal of Systems and Software
Distributed learning with data reduction
Transactions on computational collective intelligence IV
A robust missing value imputation method for noisy data
Applied Intelligence
LAD-CBM; new data processing tool for diagnosis and prognosis in condition-based maintenance
Journal of Intelligent Manufacturing
Classifying patterns with missing values using Multi-Task Learning perceptrons
Expert Systems with Applications: An International Journal
Instance driven clustering for the imputation of missing data in KDD
International Journal of Communication Networks and Distributed Systems
Impact of noise on credit risk prediction: Does data quality really matter?
Intelligent Data Analysis
Hi-index | 0.00 |
A limiting factor for the application ofIDA methods in many domains is the incompleteness of datarepositories. Many records have fields that are not filled in,especially, when data entry is manual. In addition, a significantfraction of the entries can be erroneous and there may be noalternative but to discard these records. But every cell in adatabase is not an independent datum. Statistical relationships willconstrain and, often determine, missing values. Dataimputation, the filling in of missing values for partially missingdata, can thus be an invaluable first step in many IDA projects. Newimputation methods that can handle the large-scale problems andlarge-scale sparsity of industrial databases are needed. Toillustrate the incomplete database problem, we analyze one databasewith instrumentation maintenance and test records for an industrialprocess. Despite regulatory requirements for process data collection,this database is less than 50% complete. Next, we discuss possiblesolutions to the missing data problem. Several approaches toimputation are noted and classified into two categories: data-drivenand model-based. We then describe two machine-learning-basedapproaches that we have worked with. These build upon well-knownalgorithms: AutoClass and C4.5. Several experiments are designed,all using the maintenance database as a common test-bed but withvarious data splits and algorithmic variations. Results aregenerally positive with up to 80% accuracies of imputation. Weconclude the paper by outlining some considerations in selectingimputation methods, and by discussing applications of data imputationfor intelligent data analysis.