Hierarchical mixtures of experts and the EM algorithm
Neural Computation
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A unifying review of linear Gaussian models
Neural Computation
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
NonStop SQL/MX primitives for knowledge discovery
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovering Association Rules Based on Image Content
ADL '99 Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries
CAIVL '97 Proceedings of the 1997 Workshop on Content-Based Access of Image and Video Libraries (CBAIVL '97)
SQL database primitives for decision tree classifiers
Proceedings of the tenth international conference on Information and knowledge management
FREM: fast and robust EM clustering for large data sets
Proceedings of the eleventh international conference on Information and knowledge management
Horizontal aggregations for building tabular data sets
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Efficient Disk-Based K-Means Clustering for Relational Databases
IEEE Transactions on Knowledge and Data Engineering
Programming the K-means clustering algorithm in SQL
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating K-Means Clustering with a Relational DBMS Using SQL
IEEE Transactions on Knowledge and Data Engineering
Vector and matrix operations programmed with UDFs in a relational DBMS
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Building statistical models and scoring with UDFs
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
COMBI-operator - database support for data mining applications
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Feature Selection Based on the Rough Set Theory and Expectation-Maximization Clustering Algorithm
RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
Distributed Data Mining Methodology with Classification Model Example
ICCCI '09 Proceedings of the 1st International Conference on Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems
I/O scalable Bregman co-clustering
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A data mining system based on SQL queries and UDFs for relational databases
Proceedings of the 20th ACM international conference on Information and knowledge management
The MADlib analytics library: or MAD skills, the SQL
Proceedings of the VLDB Endowment
Can we analyze big data inside a DBMS?
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Hi-index | 0.00 |
Clustering is one of the most important tasks performed in Data Mining applications. This paper presents an efficient SQL implementation of the EM algorithm to perform clustering in very large databases. Our version can effectively handle high dimensional data, a high number of clusters and more importantly, a very large number of data records. We present three strategies to implement EM in SQL: horizontal, vertical and a hybrid one. We expect this work to be useful for data mining programmers and users who want to cluster large data sets inside a relational DBMS.