Two-level k-means clustering algorithm for k-τ relationship establishment and linear-time classification

Authors:
Radha Chitta;M. Narasimha Murty
Affiliations:
Department of Electrical Engineering, Indian Institute of Science, Bangalore 560012, India;Department of Computer Science and Automation, Indian Institute of Science, Bangalore 560012, India
Venue:
Pattern Recognition
Year:
2010

Citing 13
Cited 1

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Data clustering: a review

ACM Computing Surveys (CSUR)
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast k-Nearest Neighbor Classification Using Cluster-Based Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Core Vector Machines: Fast SVM Training on Very Large Data Sets

The Journal of Machine Learning Research
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
K-means clustering versus validation measures: a data-distribution perspective

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Least squares quantization in PCM

IEEE Transactions on Information Theory

Fast parameterless density-based clustering via random projections

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.02

Visualization

Abstract

Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.