BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Very fast EM-based mixture model clustering using multiresolution kd-trees
Proceedings of the 1998 conference on Advances in neural information processing systems II
Visualization of navigation patterns on a Web site using model-based clustering
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerating EM for Large Databases
Machine Learning
WebKDD 2005: web mining and web usage analysis post-workshop report
ACM SIGKDD Explorations Newsletter
Accelerated EM-based clustering of large data sets
Data Mining and Knowledge Discovery
An overview of web data clustering practices
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Complexity control in a mixture model by the Hardy-Weinberg equilibrium
Computational Statistics & Data Analysis
Hi-index | 0.00 |
We propose several simple techniques which dramatically reduce both the memory demand and computational effort in building multinomial mixture models using the EM algorithm. The reason of the dramatic improvement in performance is that the techniques make use of certain properties of the data. These properties are: the data is sparse and there are many repeating records. We claim that particular sources of data consistently satisfy these properties. Excellent examples are Clickstream and retail data which are very sparse and consist of many repititions. Using simple techniques huge speed-ups and compression rates, on real life clickstream data sets, are observed compared to the standard implementation of the EM.