New perspectives in autonomic design patterns for stream-classification-systems
Proceedings of the 2007 workshop on Automating service quality: Held at the International Conference on Automated Software Engineering (ASE)
Journal of Data and Information Quality (JDIQ)
Hi-index | 0.00 |
Density estimation is an important pre-processing step in the problem of data stream classification in which the number of data is overwhelming and the exact data distribution is unknown. We simplify the problem by employing a statistical sampling technique to obtain an approximate solution. With the proposed method, an unbounded large data set can be sampled in a number of random configurations, and that data can be used to describe the data set as a whole. The efficiency of the method depends largely on the ability to draw samples effectively which in turn depends on how close we can estimate the target density. We use finite mixture models to represent the probability density functions of the data stream. Then, we apply the EM algorithm twice to learn the model parameters. The efficiency of our estimation technique has been shown in the experimental results.