Validating synthetic health datasets for longitudinal clustering

Authors:
Shima Ghassem Pour;Anthony Maeder;Louisa Jorm
Affiliations:
University of Western Sydney, Campbelltown, Australia;University of Western Sydney, Campbelltown, Australia;University of Western Sydney, Campbelltown, Australia
Venue:
HIKM '13 Proceedings of the Sixth Australasian Workshop on Health Informatics and Knowledge Management - Volume 142
Year:
2013

Citing 8
Cited 0

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Data clustering: a review

ACM Computing Surveys (CSUR)
Cluster validity methods: part I

ACM SIGMOD Record
Distance Metrics for Instance-Bsed Learning

ISMIS '91 Proceedings of the 6th International Symposium on Methodologies for Intelligent Systems
Understanding of Internal Clustering Validation Measures

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
A comparison of internal and external cluster validation indexes

AMERICAN-MATH'11/CEA'11 Proceedings of the 2011 American conference on applied mathematics and the 5th WSEAS international conference on Computer engineering and applications
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Estimation of the number of clusters using multiple clustering validity indices

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering methods partition datasets into subgroups with some homogeneous properties, with information about the number and particular characteristics of each subgroup unknown a priori. The problem of predicting the number of clusters and quality of each cluster might be overcome by using cluster validation methods. This paper presents such an approach incorporating quantitative methods for comparison between original and synthetic versions of longitudinal health datasets. The use of the methods is demonstrated by using two different clustering algorithms, K-means and Latent Class Analysis, to perform clustering on synthetic data derived from the 45 and Up Study baseline data, from NSW in Australia.