Efficient Disk-Based K-Means Clustering for Relational Databases

Authors:
Carlos Ordonez;Edward Omiecinski
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2004

Citing 34
Cited 17

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Principles of database and knowledge-base systems, Vol. I

Principles of database and knowledge-base systems, Vol. I
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Hierarchical mixtures of experts and the EM algorithm

Neural Computation
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The KDD process for extracting useful knowledge from volumes of data

Communications of the ACM
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A unifying review of linear Gaussian models

Neural Computation
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
NonStop SQL/MX primitives for knowledge discovery

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SQLEM: fast clustering in SQL using the EM algorithm

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SMEM algorithm for mixture models

Proceedings of the 1998 conference on Advances in neural information processing systems II
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data bubbles: quality preserving performance boosting for hierarchical clustering

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fundamentals of Database Systems

Fundamentals of Database Systems
FREM: fast and robust EM clustering for large data sets

Proceedings of the eleventh international conference on Information and knowledge management
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
The LBG-U Method for Vector Quantization – an Improvement over LBGInspired from Neural Networks

Neural Processing Letters
Integrating Data Mining with SQL Databases: OLE DB for Data Mining

Proceedings of the 17th International Conference on Data Engineering
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
C2P: Clustering based on Closest Pairs

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)

Horizontal aggregations for building tabular data sets

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Programming the K-means clustering algorithm in SQL

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating K-Means Clustering with a Relational DBMS Using SQL

IEEE Transactions on Knowledge and Data Engineering
Projected clustering for categorical datasets

Pattern Recognition Letters
Theoretical properties of two problems of distribution of interrelated data

Proceedings of the 44th annual Southeast regional conference
Parallel bisecting k-means with prediction clustering algorithm

The Journal of Supercomputing
Exploiting parallelism to support scalable hierarchical clustering

Journal of the American Society for Information Science and Technology
Discovering frequent itemsets by support approximation and itemset clustering

Data & Knowledge Engineering
A general grid-clustering approach

Pattern Recognition Letters
Categorical Data Clustering Using the Combinations of Attribute Values

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Models for association rules based on clustering and correlation

Intelligent Data Analysis
Text document clustering based on neighbors

Data & Knowledge Engineering
I/O scalable Bregman co-clustering

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A time-efficient pattern reduction algorithm for k-means clustering

Information Sciences: an International Journal
XML data clustering: An overview

ACM Computing Surveys (CSUR)
Scalable k-means++

Proceedings of the VLDB Endowment
A comparative study of efficient initialization methods for the k-means clustering algorithm

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

K-means is one of the most popular clustering algorithms. This article introduces an efficient disk-based implementation of K-means. The proposed algorithm is designed to work inside a relational database management system. It can cluster large data sets having very high dimensionality. In general, it only requires three scans over the data set. It is optimized to perform heavy disk I/O and its memory requirements are low. Its parameters are easy to set. An extensive experimental section evaluates quality of results and performance. The proposed algorithm is compared against the Standard K-means algorithm as well as the Scalable K-means algorithm.