Imputation of Missing Data in Industrial Databases

Authors:
Kamakshi Lakshminarayan;Steven A. Harp;Tariq Samad
Affiliations:
Honeywell Technology Center, 3660 Technology Drive, Minneapolis, MN 55418. laksh004@tc.umn.edu;Honeywell Technology Center, 3660 Technology Drive, Minneapolis, MN 55418. sharp@htc.honeywell.com;Honeywell Technology Center, 3660 Technology Drive, Minneapolis, MN 55418. samad@htc.honeywell.com
Venue:
Applied Intelligence
Year:
1999

Citing 3
Cited 25

Statistical analysis with missing data

Statistical analysis with missing data
C4.5: programs for machine learning

C4.5: programs for machine learning
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining

"Missing Is Useful': Missing Values in Cost-Sensitive Decision Trees

IEEE Transactions on Knowledge and Data Engineering
Ensemble of missing data techniques to improve software prediction accuracy

Proceedings of the 28th international conference on Software engineering
Using discordance to improve classification in narrative clinical databases: An application to community-acquired pneumonia

Computers in Biology and Medicine
Privacy-preserving imputation of missing data

Data & Knowledge Engineering
Missing Data Imputation Techniques

International Journal of Business Intelligence and Data Mining
An iterative refinement approach for data cleaning

Intelligent Data Analysis
Impact of imputation of missing values on classification error for discrete data

Pattern Recognition
POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases

Expert Systems with Applications: An International Journal
Aprimorando processos de imputação multivariada de dados com workflows

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES

Applied Artificial Intelligence
Imputation of missing sensor data values using in-exact replicas

International Journal of Intelligent Systems Technologies and Applications
Selection-fusion approach for classification of datasets with missing values

Pattern Recognition
GBKII: an imputation method for missing values

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Ensemble missing data techniques for software effort prediction

Intelligent Data Analysis
Missing data imputation using statistical and machine learning methods in a real breast cancer problem

Artificial Intelligence in Medicine
The Effects and Interactions of Data Quality and Problem Complexity on Classification

Journal of Data and Information Quality (JDIQ)
A review and comparison of strategies for handling missing values in separate-and-conquer rule learning

Journal of Intelligent Information Systems
Missing data imputation by utilizing information within incomplete instances

Journal of Systems and Software
Distributed learning with data reduction

Transactions on computational collective intelligence IV
A robust missing value imputation method for noisy data

Applied Intelligence
LAD-CBM; new data processing tool for diagnosis and prognosis in condition-based maintenance

Journal of Intelligent Manufacturing
Classifying patterns with missing values using Multi-Task Learning perceptrons

Expert Systems with Applications: An International Journal
Instance driven clustering for the imputation of missing data in KDD

International Journal of Communication Networks and Distributed Systems
Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering

Applied Intelligence
Impact of noise on credit risk prediction: Does data quality really matter?

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

A limiting factor for the application ofIDA methods in many domains is the incompleteness of datarepositories. Many records have fields that are not filled in,especially, when data entry is manual. In addition, a significantfraction of the entries can be erroneous and there may be noalternative but to discard these records. But every cell in adatabase is not an independent datum. Statistical relationships willconstrain and, often determine, missing values. Dataimputation, the filling in of missing values for partially missingdata, can thus be an invaluable first step in many IDA projects. Newimputation methods that can handle the large-scale problems andlarge-scale sparsity of industrial databases are needed. Toillustrate the incomplete database problem, we analyze one databasewith instrumentation maintenance and test records for an industrialprocess. Despite regulatory requirements for process data collection,this database is less than 50% complete. Next, we discuss possiblesolutions to the missing data problem. Several approaches toimputation are noted and classified into two categories: data-drivenand model-based. We then describe two machine-learning-basedapproaches that we have worked with. These build upon well-knownalgorithms: AutoClass and C4.5. Several experiments are designed,all using the maintenance database as a common test-bed but withvarious data splits and algorithmic variations. Results aregenerally positive with up to 80% accuracies of imputation. Weconclude the paper by outlining some considerations in selectingimputation methods, and by discussing applications of data imputationfor intelligent data analysis.