A modification of the k-means method for quasi-unsupervised learning

Authors:
David Rebollo-Monedero;Marc Solé;Jordi Nin;Jordi Forné
Affiliations:
Department of Telematics Engineering, Technical University of Catalonia (UPC), E-08034 Barcelona, Spain;Department of Computer Architecture, Technical University of Catalonia (UPC), E-08034 Barcelona, Spain;Department of Computer Architecture, Technical University of Catalonia (UPC), E-08034 Barcelona, Spain;Department of Telematics Engineering, Technical University of Catalonia (UPC), E-08034 Barcelona, Spain
Venue:
Knowledge-Based Systems
Year:
2013

Citing 24
Cited 0

Vector quantization and signal compression

Vector quantization and signal compression
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
A Robust Competitive Clustering Algorithm With Applications in Computer Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Practical Data-Oriented Microaggregation for Statistical Disclosure Control

IEEE Transactions on Knowledge and Data Engineering
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
K-means Clustering Algorithm for Categorical Attributes

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Accurate Retail Testing of Fashion Merchandise: Methodology and Application

Marketing Science
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation

Data Mining and Knowledge Discovery
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Multinomial mixture model with feature selection for text clustering

Knowledge-Based Systems
Discovering unexpected documents in corpora

Knowledge-Based Systems
k-Means Has Polynomial Smoothed Complexity

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
A classification algorithm based on local cluster centers with a few labeled training examples

Knowledge-Based Systems
Semi-Supervised Learning

Semi-Supervised Learning
Data clustering with size constraints

Knowledge-Based Systems
An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data

Knowledge-Based Systems
Quantization

IEEE Transactions on Information Theory
Least squares quantization in PCM

IEEE Transactions on Information Theory
Survey of clustering algorithms

IEEE Transactions on Neural Networks
Semantically-grounded construction of centroids for datasets with textual attributes

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since the advent of data clustering, the original formulation of the clustering problem has been enriched to incorporate a number of twists to widen its range of application. In particular, recent heuristic approaches have proposed to incorporate restrictions on the size of the clusters, while striving to minimize a measure of dissimilarity within them. Such size constraints effectively constitute a way to exploit prior knowledge, readily available in many scenarios, which can lead to an improved performance in the clustering obtained. In this paper, we build upon a modification of the celebrated k-means method resorting to a similar alternating optimization procedure, endowed with additive partition weights controlling the size of the partitions formed, adjusted by means of the Levenberg-Marquardt algorithm. We propose several further variations on this modification, in which different kinds of additional information are present. We report experimental results on various standardized datasets, demonstrating that our approaches outperform existing heuristics for size-constrained clustering. The running-time complexity of our proposal is assessed experimentally by means of a power-law regression analysis.