Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario

Authors:
Alessandro G. Di Nuovo
Affiliations:
Universitá degli Studi di Catania, Viale Andrea Doria 6, 95125 Catania, Italy
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 1
Cited 4

Fuzzy c-means clustering of incomplete data

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

An analysis on the use of pre-processing methods in evolutionary fuzzy systems for subgroup discovery

Expert Systems with Applications: An International Journal
A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm

Information Sciences: an International Journal
RespiDiag: A Case-Based Reasoning System for the Diagnosis of Chronic Obstructive Pulmonary Disease

Expert Systems with Applications: An International Journal
Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering

Applied Intelligence

Quantified Score

Hi-index	12.05

Visualization

Abstract

In scientific research, and particularly in psychological studies, data for some variables in the database to be analyzed may well be missing. If not dealt with in the correct way, the missing values may weaken or even compromise the validity of research into the database, especially if it is a small one. In this paper we introduce the most common solutions to this problem offered by the most popular statistical software and a technique based on the most famous fuzzy clustering algorithm: Fuzzy C-Means (FCM). Then we compare these methodologies in order to highlight the peculiar characteristics of each solution. The comparison was made in a psychological research environment, using a database of in-patients who have a diagnosis of mental retardation. The results demonstrate that completion techniques, and in particular the one based on FCM, lead to effective data imputation, avoiding the deletion of elements with missing data, which diminishes the power of the research.