Instance driven clustering for the imputation of missing data in KDD

Authors:
P. Ilango;K. Vijayakumar;M. Rajasekhara Babu
Affiliations:
School of Computing Science and Engineering, VIT University, Vellore - 632014, Tamilnadu, India;School of Computing Science and Engineering, VIT University, Vellore - 632014, Tamilnadu, India;School of Computing Science and Engineering, VIT University, Vellore - 632014, Tamilnadu, India
Venue:
International Journal of Communication Networks and Distributed Systems
Year:
2014

Citing 17
Cited 0

Statistical analysis with missing data

Statistical analysis with missing data
Discovering data mining: from concept to implementation

Discovering data mining: from concept to implementation
Data clustering: a review

ACM Computing Surveys (CSUR)
Data mining: concepts and techniques

Data mining: concepts and techniques
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
Data Mining: Concepts, Models, Methods and Algorithms

Data Mining: Concepts, Models, Methods and Algorithms
Imputation of Missing Data in Industrial Databases

Applied Intelligence
The CN2 Induction Algorithm

Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Expert Constrained Clustering: A Symbolic Approach

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Intelligent clustering with instance-level constraints

Intelligent clustering with instance-level constraints
Knowledge Discovery through Mining Emergency Department Data

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 6 - Volume 06
A Missing Data Estimation Analysis in Type II Diabetes Databases

CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
Semi-Supervised Clustering Models for Clinical Risk Assessment

BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
GBKII: an imputation method for missing values

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Lazy decision trees

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ongoing research and development process in medical data mining have opened up versatile computer assisted approaches for effective clinical decisions. The nature and quality of the selected sample for training is largely responsible for the performance of the data mining algorithms. The large quantities of cumulative data collected from various sources suffer from qualitative deficiency factors such as inconsistency, incompleteness and redundancy. Addressing the prime problem of missing data is vital as it may introduce a bias into the model under evaluation, at times leading to inaccurate results. Imputation of missing data through instance-based clustering methodology is proposed in this paper. A complete dataset, Pima Indian Type II Diabetes, is considered for evaluation of the proposed method and its usefulness and performance are estimated through average imputation error E. The results illustrate that the proposed clustering method gives a lesser and stable error rate compared to other existing imputation methods.