A fast implementation of the EM algorithm for mixture of multinomials

  • Authors:
  • Jan Peter Patist

  • Affiliations:
  • Free University Amsterdam, Amsterdam, The Netherlands

  • Venue:
  • ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose several simple techniques which dramatically reduce both the memory demand and computational effort in building multinomial mixture models using the EM algorithm. The reason of the dramatic improvement in performance is that the techniques make use of certain properties of the data. These properties are: the data is sparse and there are many repeating records. We claim that particular sources of data consistently satisfy these properties. Excellent examples are Clickstream and retail data which are very sparse and consist of many repititions. Using simple techniques huge speed-ups and compression rates, on real life clickstream data sets, are observed compared to the standard implementation of the EM.