Hierarchical Clustering of Female Urinary Incontinence - Data Having Noise and Outliers

Authors:
Jorma Laurikkala;Martti Juhola
Affiliations:
-;-
Venue:
ISMDA '01 Proceedings of the Second International Symposium on Medical Data Analysis
Year:
2001

Citing 5
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Applied multivariate techniques

Applied multivariate techniques
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Induction of Decision Trees

Machine Learning
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We studied pre-processing of a female urinary incontinence data set by removing uninformative variables, outliers, and noise, to allow hierarchical clustering methods to find partitions that resemble the diagnostic classes. Outliers were identified with box plots and Mahalanobis distances, while noisy cases were detected with the repeated edited nearest neighbor rule. The cleaned data were analyzed with six clustering methods. The best results, as measured with Fowlkes and Mallows similarity measure, were achieved with complete linkage (0.90) and Ward's method (0.84). These methods managed to separate the two largest diagnostic classes, stress and mixed incontinence, from each other. Unfortunately, single linkage, average linkage, centroid, and median methods were not able to differentiate between these classes. The results are in accord with our earlier results indicating that supervised methods suit better for classification of this data than cluster analysis. However, outliers, noise, and clusters, which were identified, may be of interest to expert physicians.