Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Algorithms for Model-Based Gaussian Hierarchical Clustering
SIAM Journal on Scientific Computing
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Very fast EM-based mixture model clustering using multiresolution kd-trees
Proceedings of the 1998 conference on Advances in neural information processing systems II
A general probabilistic framework for clustering individuals and objects
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
An experimental comparison of model-based clustering methods
Machine Learning
A robust and scalable clustering algorithm for mixed type attributes in large database environment
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating EM for Large Databases
Machine Learning
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Mining and Knowledge Discovery
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Scalable model-based clustering algorithms for large databases and their applications
Scalable model-based clustering algorithms for large databases and their applications
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A delivery framework for health data mining and analytics
ACSC '05 Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38
Scalable Model-Based Clustering for Large Databases Based on Data Summarization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Adaptive, convergent, and diversified archiving strategy for multiobjective evolutionary algorithms
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
The scalability problem in data mining involves the developmentof methods for handling large databases withlimited computational resources. In this paper, we presenta two-phase scalable model-based clustering framework:First, a large data set is summed up into sub-clusters; Then,clusters are directly generated from the summary statisticsof sub-clusters by a specifically designed Expectation-Maximization(EM) algorithm. Taking example for Gaussianmixture models, we establish a provably convergentEM algorithm, EMADS, which embodies cardinality, mean,and covariance information of each sub-cluster explicitly.Combining with different data summarization procedures,EMADS is used to construct two clustering systems:gEMADS and bEMADS. The experimental results demonstratethat they run several orders of magnitude faster thanthe classic EM algorithm with little loss of accuracy. Theygenerate significantly better results than other model-basedclustering systems using similar computational resources.