Clustering cubes with binary dimensions in one pass

Authors:
Carlos Garcia-Alvarado;Carlos Ordonez
Affiliations:
Pivotal Inc., San Mateo, CA, USA;University of Houston, Houston, TX, USA
Venue:
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Year:
2013

Citing 19
Cited 1

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Querying and mining data streams: you only get one look a tutorial

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Issues in data stream management

ACM SIGMOD Record
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Mining data streams: a review

ACM SIGMOD Record
Integrating K-Means Clustering with a Relational DBMS Using SQL

IEEE Transactions on Knowledge and Data Engineering
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
ATLAS: a small but complete SQL extension for data mining and data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient Distance Computation Using SQL Queries and UDFs

ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
MAD skills: new analysis practices for big data

Proceedings of the VLDB Endowment
Statistical Model Computation with UDFs

IEEE Transactions on Knowledge and Data Engineering

DOLAP 2013 workshop summary

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding aggregations of records with high dimensionality in large data warehouses is a crucial and costly task. These groups of similar records are the result of partitions obtained with GROUP BYs. In this research, we focus on obtaining aggregations of groups of similar records by turning the problem into efficient binary clustering of a fact table as a relaxation of a GROUP BY clause. We present an efficient window-based Incremental K-Means algorithm in a relational database system implemented as a user-defined function. This variant is based on the Incremental K-Means algorithm. The speed up is achieved through the computation of sufficient statistics, multithreading, efficient distance computation and sparse matrix operations. Finally, the performance of our algorithm is compared against multiple variants of the K-Means algorithm. Our experiments show that our incremental K-Means algorithm achieves similar or even better results more quickly than the traditional K-Means algorithm.