Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Algorithms for Model-Based Gaussian Hierarchical Clustering
SIAM Journal on Scientific Computing
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Very fast EM-based mixture model clustering using multiresolution kd-trees
Proceedings of the 1998 conference on Advances in neural information processing systems II
Data mining: concepts and techniques
Data mining: concepts and techniques
An experimental comparison of model-based clustering methods
Machine Learning
Unsupervised Learning of Finite Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Accelerating EM for Large Databases
Machine Learning
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Mining and Knowledge Discovery
Computer
Transformation-Invariant Clustering Using the EM Algorithm
IEEE Transactions on Pattern Analysis and Machine Intelligence
Scalable model-based clustering algorithms for large databases and their applications
Scalable model-based clustering algorithms for large databases and their applications
Scalable Model-based Clustering by Working on Data Summaries
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Clustering by committee
Scalable model-based cluster analysis using clustering features
Pattern Recognition
Boltzmann machine learning with the latent maximum entropy principle
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Analysis of breast feeding data using data mining methods
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Knowledge Discovery from Honeypot Data for Monitoring Malicious Attacks
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
A scalable framework for cluster ensembles
Pattern Recognition
Combining evolutionary and stochastic gradient techniques for system identification
Journal of Computational and Applied Mathematics
Practical issues on privacy-preserving health data mining
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Future Generation Computer Systems
Data summarization for network traffic monitoring
Journal of Network and Computer Applications
Hi-index | 0.15 |
The scalability problem in data mining involves the development of methods for handling large databases with limited computational resources such as memory and computation time. In this paper, two scalable clustering algorithms, bEMADS and gEMADS, are presented based on the Gaussian mixture model. Both summarize data into subclusters and then generate Gaussian mixtures from their data summaries. Their core algorithm, EMADS, is defined on data summaries and approximates the aggregate behavior of each subcluster of data under the Gaussian mixture model. EMADS is provably convergent. Experimental results substantiate that both algorithms can run several orders of magnitude faster than expectation-maximization with little loss of accuracy.