Feature subspace transformations for enhancing k-means clustering

Authors:
Anirban Chatterjee;Sanjukta Bhowmick;Padma Raghavan
Affiliations:
The Pennsylvania State University, University Park, PA, USA;University of Nebraska, Omaha, NE, USA;The Pennsylvania State University, University Park, PA, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 3
Cited 0

Graph drawing by force-directed placement

Software—Practice & Experience
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Weighted Graph Cuts without Eigenvectors A Multilevel Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unsupervised classification typically concerns identifying clusters of similar entities in an unlabeled dataset. Popular methods include clustering based on (i) distance-based metrics between the entities in the feature space (K-Means), and (ii) combinatorial properties in a weighted graph representation of the dataset (Multilevel K-Means). In this paper, we present a force-directed graph layout based feature subspace transformation (FST) scheme to transform the dataset before the application of K-Means. Our FST-K-Means method utilizes both distance-based and combinatorial attributes of the original dataset to seek improvements in the internal and external quality metrics of unsupervised classification. We demonstrate the effectiveness of FST-K-Means in improving classification quality relative to K-Means and Multilevel K-Means (GraClus). The quality of classification is measured by observing internal and external quality metrics on a test suite of datasets. Our results indicate that on average, the internal quality metric (cluster cohesiveness) is 20.2% better than K-Means, and 6.6% better than GraClus. More significantly, FST-K-Means improves the external quality metric (accuracy) of classification on average by 14.9% relative to K-Means and 23.6% relative to GraClus.