A polythetic clustering process and cluster validity indexes for histogram-valued objects

Authors:
Jaejik Kim;L. Billard
Affiliations:
Department of Biostatistics, Georgia Health Sciences University, Augusta, GA 30912, USA;Department of Statistics, University of Georgia, Athens, GA 30602, USA
Venue:
Computational Statistics & Data Analysis
Year:
2011

Citing 8
Cited 1

A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic validation approach for clustering

Pattern Recognition Letters
A monothetic clustering method

Pattern Recognition Letters
Data Mining Techniques: For Marketing, Sales, and Customer Support

Data Mining Techniques: For Marketing, Sales, and Customer Support
New indices for cluster validity assessment

Pattern Recognition Letters
DIVCLUS-T: A monothetic divisive hierarchical clustering method

Computational Statistics & Data Analysis
DIVFRP: An automatic divisive hierarchical clustering method based on the furthest reference points

Pattern Recognition Letters
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

Dissimilarity measures and divisive clustering for symbolic multimodal-valued data

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

Clustering is an explanatory procedure which helps to understand data with complex structure and multivariate relationships, and is a very useful method to extract knowledge and information especially from large datasets. When such datasets are aggregated into categories (as driven by scientific questions underlying the analysis), the resulting observations will perforce be expressed as so-called symbolic data (though symbolic data can occur ''naturally'' in any sized datasets). The focus of this work is to provide a divisive polythetic algorithm to establish clusters for p-dimensional histogram-valued data. In addition, two cluster validity indexes for use in establishing the optimal number of clusters are also developed. Finally, the proposed procedure is applied to a large forestry cover type dataset.