Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Principles of database and knowledge-base systems, Vol. I
Principles of database and knowledge-base systems, Vol. I
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Hierarchical mixtures of experts and the EM algorithm
Neural Computation
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The KDD process for extracting useful knowledge from volumes of data
Communications of the ACM
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A unifying review of linear Gaussian models
Neural Computation
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
NonStop SQL/MX primitives for knowledge discovery
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SQLEM: fast clustering in SQL using the EM algorithm
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SMEM algorithm for mixture models
Proceedings of the 1998 conference on Advances in neural information processing systems II
Scalability for clustering algorithms revisited
ACM SIGKDD Explorations Newsletter
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data bubbles: quality preserving performance boosting for hierarchical clustering
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fundamentals of Database Systems
Fundamentals of Database Systems
FREM: fast and robust EM clustering for large data sets
Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
The LBG-U Method for Vector Quantization – an Improvement over LBGInspired from Neural Networks
Neural Processing Letters
Integrating Data Mining with SQL Databases: OLE DB for Data Mining
Proceedings of the 17th International Conference on Data Engineering
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
C2P: Clustering based on Closest Pairs
Proceedings of the 27th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clustering binary data streams with K-means
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Horizontal aggregations for building tabular data sets
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Programming the K-means clustering algorithm in SQL
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating K-Means Clustering with a Relational DBMS Using SQL
IEEE Transactions on Knowledge and Data Engineering
Projected clustering for categorical datasets
Pattern Recognition Letters
Theoretical properties of two problems of distribution of interrelated data
Proceedings of the 44th annual Southeast regional conference
Parallel bisecting k-means with prediction clustering algorithm
The Journal of Supercomputing
Exploiting parallelism to support scalable hierarchical clustering
Journal of the American Society for Information Science and Technology
Discovering frequent itemsets by support approximation and itemset clustering
Data & Knowledge Engineering
A general grid-clustering approach
Pattern Recognition Letters
Categorical Data Clustering Using the Combinations of Attribute Values
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Models for association rules based on clustering and correlation
Intelligent Data Analysis
Text document clustering based on neighbors
Data & Knowledge Engineering
I/O scalable Bregman co-clustering
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A time-efficient pattern reduction algorithm for k-means clustering
Information Sciences: an International Journal
XML data clustering: An overview
ACM Computing Surveys (CSUR)
Proceedings of the VLDB Endowment
A comparative study of efficient initialization methods for the k-means clustering algorithm
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
K-means is one of the most popular clustering algorithms. This article introduces an efficient disk-based implementation of K-means. The proposed algorithm is designed to work inside a relational database management system. It can cluster large data sets having very high dimensionality. In general, it only requires three scans over the data set. It is optimized to perform heavy disk I/O and its memory requirements are low. Its parameters are easy to set. An extensive experimental section evaluates quality of results and performance. The proposed algorithm is compared against the Standard K-means algorithm as well as the Scalable K-means algorithm.