Weighted k-means for density-biased clustering

Authors:
Kittisak Kerdprasop;Nittaya Kerdprasop;Pairote Sattayatham
Affiliations:
Data Engineering and Knowledge Discovery Research Unit, School of Computer Engineering, Suranaree University of Technology, Nakhon Ratchasima, Thailand;Data Engineering and Knowledge Discovery Research Unit, School of Computer Engineering, Suranaree University of Technology, Nakhon Ratchasima, Thailand;School of Mathematics, Suranaree University of Technology, Nakhon Ratchasima, Thailand
Venue:
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Year:
2005

Citing 22
Cited 5

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))

ACM Transactions on Mathematical Software (TOMS)
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random sampling techniques for space efficient online computation of order statistics of large datasets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An experimental comparison of model-based clustering methods

Machine Learning
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
An efficient and effective algorithm for density biased sampling

Proceedings of the eleventh international conference on Information and knowledge management
Alternatives to the k-means algorithm that find better clusterings

Proceedings of the eleventh international conference on Information and knowledge management
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Sampling from Spatial Databases

Proceedings of the Ninth International Conference on Data Engineering
Repairing Faulty Mixture Models using Density Estimation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
C2P: Clustering based on Closest Pairs

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Combining Sampling Technique with DBSCAN Algorithm for Clustering Large Spatial Databases

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets

IEEE Transactions on Knowledge and Data Engineering

Supporting ranking and clustering as generalized order-by and group-by

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Parallelization of K-means clustering on multi-core processors

ACS'10 Proceedings of the 10th WSEAS international conference on Applied computer science
Class consistent k-means: Application to face and action recognition

Computer Vision and Image Understanding
A clustering based feature selection method in spectro-temporal domain for speech recognition

Engineering Applications of Artificial Intelligence
Magnitude Sensitive Competitive Learning

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a task of grouping data based on similarity. A popular k-means algorithm groups data by firstly assigning all data points to the closest clusters, then determining the cluster means. The algorithm repeats these two steps until it has converged. We propose a variation called weighted k-means to improve the clustering scalability. To speed up the clustering process, we develop the reservoir-biased sampling as an efficient data reduction technique since it performs a single scan over a data set. Our algorithm has been designed to group data of mixture models. We present an experimental evaluation of the proposed method.