Machine Learning - Special issue on learning with probabilistic representations
Accelerating EM for Large Databases
Machine Learning
M-Kernel Merging: Towards Density Estimation over Data Streams
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Wavelet density estimators over data streams
Proceedings of the 2005 ACM symposium on Applied computing
Wavelet density estimators over data streams
Proceedings of the 2005 ACM symposium on Applied computing
Naive Bayes models for probability estimation
ICML '05 Proceedings of the 22nd international conference on Machine learning
On-line EM Algorithm for the Normalized Gaussian Network
Neural Computation
Data Streams: Models and Algorithms (Advances in Database Systems)
Data Streams: Models and Algorithms (Advances in Database Systems)
Hi-index | 0.00 |
In this paper we propose an algorithm for the on-line maintenance of the joint probability distribution of a data stream. The joint probability distribution is modeled by a mixture of low dependence Bayesian networks, and maintained by an on-line EM-algorithm. Modeling the joint probability function by a mixture of low dependence Bayesian networks is motivated by two key observations. First, the probability distribution can be maintained with time cost linear in the number of data points and constant time per data point. Whereas other methods like Bayesian networks have polynomial time complexity. Secondly, looking at the literature there is empirical indication [1] that mixtures of Naive-Bayes structures can model the data as accurate as Bayesian networks. In this paper we relax the constraints of the mixture model of Naive-Bayes structures to that of the mixture models of arbitrary low dependence structures. Furthermore we propose an on-line algorithm for the maintenance of a mixture model of arbitrary Bayesian networks. We empirically show that speed-up is achieved with no decrease in performance.