BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Making large-scale support vector machine learning practical
Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
ACM Computing Surveys (CSUR)
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Classifying large data sets using SVMs with hierarchical clusters
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast k-Nearest Neighbor Classification Using Cluster-Based Trees
IEEE Transactions on Pattern Analysis and Machine Intelligence
Core Vector Machines: Fast SVM Training on Very Large Data Sets
The Journal of Machine Learning Research
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
K-means clustering versus validation measures: a data-distribution perspective
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Least squares quantization in PCM
IEEE Transactions on Information Theory
Fast parameterless density-based clustering via random projections
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.02 |
Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.