M-Kernel Merging: Towards Density Estimation over Data Streams

  • Authors:
  • Aoying Zhou;Zhiyuan Cai;Li Wei;Weining Qian

  • Affiliations:
  • -;-;-;-

  • Venue:
  • DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Density estimation is a costly operation for computingdistribution information of data sets underlying many important data mining applications, such as clustering andbiased sampling. However, traditional density estimationmethods are inapplicable for streaming data, which arecontinuously arriving large volume of data, because of theirrequest for linear storage and square size calculation. Theshortcoming limits the application of many existing effective algorithms on data streams, for which the mining problem is an emergency for applications and a challenge forresearch. In this paper, the problem of computing densityfunctions over data streams is examined. A novel methodattacking this shortcoming of existing methods is developedto enable density estimation for large volume of data in linear time, fixed size memory, and without lose of accuracy .The method is based on M-Kernel merging, so that limited kernel functions to be maintained are determined intelligently. The application of the new method on differentstreaming data models is discussed, and the result of intensive experiments is presented. The analytical and empirical result show that this new density estimation algorithmfor data streams can calculate density functions on demandat any time with high accuracy for different streaming datamodels.