Algorithms for clustering data
Algorithms for clustering data
SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Sublinear time approximate clustering
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Approximate clustering via core-sets
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
A local search approximation algorithm for k-means clustering
Proceedings of the eighteenth annual symposium on Computational geometry
Projective clustering in high dimensions using core-sets
Proceedings of the eighteenth annual symposium on Computational geometry
Clustering Algorithms
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Mining and Knowledge Discovery
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Acceleration of K-Means and Related Clustering Algorithms
ALENEX '02 Revised Papers from the 4th International Workshop on Algorithm Engineering and Experiments
Approximation schemes for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Faster core-set constructions and data stream algorithms in fixed dimensions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Discrete & Computational Geometry
On coresets for k-means and k-median clustering
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Optimal Time Bounds for Approximate Clustering
Machine Learning
Approximating extent measures of points
Journal of the ACM (JACM)
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Coresets in dynamic geometric data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Smaller coresets for k-median and k-means clustering
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
Approximating largest convex hulls for imprecise points
Journal of Discrete Algorithms
Domain-specific sentiment analysis using contextual feature generation
Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
Approximating largest convex hulls for imprecise points
WAOA'07 Proceedings of the 5th international conference on Approximation and online algorithms
Automatic k-means for color enteromorpha image segmentation
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Dynamic decentralized mapping of tree-structured applications on NoC architectures
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
The three steps of clustering in the post-genomic era: a synopsis
CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
Dynamic k-means: a clustering technique for moving object trajectories
International Journal of Intelligent Information and Database Systems
k-means clustering on pre-calculated distance-based nearest neighbor search for image search
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part II
Learning Big (Image) Data via Coresets for Dictionaries
Journal of Mathematical Imaging and Vision
Hi-index | 0.00 |
In this paper we develop an efficient implementation for a k-means clustering algorithm. Our algorithm is a variant of KMHybrid [28, 20], i.e. it uses a combination of Lloyd-steps and random swaps, but as a novel feature it uses coresets to speed up the algorithm. A coreset is a small weighted set of points that approximates the original point set with respect to the considered problem. The main strength of the algorithm is that it can quickly determine clusterings of the same point set for many values of k. This is necessary in many applications, since, typically, one does not know a good value for k in advance. Once we have clusterings for many different values of k we can determine a good choice of k using a quality measure of clusterings that is independent of k, for example the average silhouette coefficient. The average silhouette coefficient can be approximated using coresets.To evaluate the performance of our algorithm we compare it with algorithm KMHybrid [28] on typical 3D data sets for an image compression application and on artificially created instances. Our data sets consist of 300,000 to 4.9 million points. We show that our algorithm significantly outperforms KMHybrid on most of these input instances. Additionally, the quality of the solutions computed by our algorithm deviates less than that of KMHybrid.We also computed clusterings and approximate average silhouette coefficient for k=1,…,100 for our input instances and discuss the performance of our algorithm in detail.