Hierarchical Clustering of Female Urinary Incontinence - Data Having Noise and Outliers

  • Authors:
  • Jorma Laurikkala;Martti Juhola

  • Affiliations:
  • -;-

  • Venue:
  • ISMDA '01 Proceedings of the Second International Symposium on Medical Data Analysis
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We studied pre-processing of a female urinary incontinence data set by removing uninformative variables, outliers, and noise, to allow hierarchical clustering methods to find partitions that resemble the diagnostic classes. Outliers were identified with box plots and Mahalanobis distances, while noisy cases were detected with the repeated edited nearest neighbor rule. The cleaned data were analyzed with six clustering methods. The best results, as measured with Fowlkes and Mallows similarity measure, were achieved with complete linkage (0.90) and Ward's method (0.84). These methods managed to separate the two largest diagnostic classes, stress and mixed incontinence, from each other. Unfortunately, single linkage, average linkage, centroid, and median methods were not able to differentiate between these classes. The results are in accord with our earlier results indicating that supervised methods suit better for classification of this data than cluster analysis. However, outliers, noise, and clusters, which were identified, may be of interest to expert physicians.