Enhancing K-Means using class labels

Authors:
Billy Peralta;Pablo Espinace;Alvaro Soto
Affiliations:
Pontificia Universidad Católica de Chile, Región Metropolitana, Chile;Pontificia Universidad Católica de Chile, Región Metropolitana, Chile;Pontificia Universidad Católica de Chile, Región Metropolitana, Chile
Venue:
Intelligent Data Analysis
Year:
2013

Citing 21
Cited 0

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Discriminative Clustering: Optimal Contingency Tables by Learning Metrics

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Supervised Clustering " Algorithms and Benefits

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Creating Efficient Codebooks for Visual Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Comparing clusterings: an axiomatic view

ICML '05 Proceedings of the 22nd international conference on Machine learning
Principled Hybrids of Generative and Discriminative Models

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage
Top 10 algorithms in data mining

Knowledge and Information Systems
Introduction to Information Retrieval

Introduction to Information Retrieval
Information theoretic measures for clusterings comparison: is a correction for chance necessary?

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

The Journal of Machine Learning Research
Mining of Massive Datasets

Mining of Massive Datasets

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class-uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means LK-Means, an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: i A discriminative score based on class labels, and ii A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation.