Differentiated treatment of missing values in fuzzy clustering

Authors:
Heiko Timm;Christian Döring;Rudolf Kruse
Affiliations:
Dept. of Knowledge Processing and Language Engineering, Otto-von-Guericke-University of Magdeburg, Magdeburg, Germany;Dept. of Knowledge Processing and Language Engineering, Otto-von-Guericke-University of Magdeburg, Magdeburg, Germany;Dept. of Knowledge Processing and Language Engineering, Otto-von-Guericke-University of Magdeburg, Magdeburg, Germany
Venue:
IFSA'03 Proceedings of the 10th international fuzzy systems association World Congress conference on Fuzzy sets and systems
Year:
2003

Citing 4
Cited 2

Statistical analysis with missing data

Statistical analysis with missing data
Unsupervised Optimal Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing

Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
Fuzzy c-means clustering of incomplete data

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Similarities in fuzzy data mining: from a cognitive view to real-world applications

WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
Clustering with Missing Values

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially missing datasets are a prevailing problem in data analysis. Since several reasons for missing attribute values can be distinguished, we suggest a differentiated treatment of this common problem. For datasets, in which feature values are missing completely at random, a variety of approaches has been proposed. In other situations, however, the fact that values are missing provides additional information for the classification of the dataset. Since the known approaches cannot exploit this information, we developed an extension of the Gath and Geva algorithm that can utilize it. We introduce a class specific probability for missing values in order to appropriately assign incomplete data points to clusters. Benchmark datasets are used to demonstrate the capability of the presented approach.