A set correlation model for partitional clustering

Authors:
Nguyen Xuan Vinh;Michael E. Houle
Affiliations:
School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, NSW, Australia;National Institute of Informatics, Tokyo, Japan
Venue:
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Year:
2010

Citing 10
Cited 0

ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Introduction to Algorithms

Introduction to Algorithms
Cluster analysis of gene expression data

Cluster analysis of gene expression data
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
The Amsterdam Library of Object Images

International Journal of Computer Vision
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Clustering Using a Similarity Measure Based on Shared Near Neighbors

IEEE Transactions on Computers
The Relevant-Set Correlation Model for Data Clustering

Statistical Analysis and Data Mining
Information theoretic measures for clusterings comparison: is a correction for chance necessary?

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Experiments with random projection

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces GlobalRSC, a novel formulation for partitional data clustering based on the Relevant Set Correlation (RSC) clustering model. Our formulation resembles that of the K-means clustering model, but with a shared-neighbor similarity measure instead of the Euclidean distance. Unlike K-means and most other clustering heuristics that can only work with real-valued data and distance measures taken from specific families, GlobalRSC has the advantage that it can work with any distance measure, and any data representation. We also discuss various techniques for boosting the scalability of GlobalRSC.