Finding frequent items in data streams using ESBF

  • Authors:
  • ShuYun Wang;XiuLan Hao;HeXiang Xu;YunFa Hu

  • Affiliations:
  • Department of Computing and Information Technology, Fudan University, P.R.C.;Department of Computing and Information Technology, Fudan University, P.R.C.;Department of Computing and Information Technology, Fudan University, P.R.C.;Department of Computing and Information Technology, Fudan University, P.R.C.

  • Venue:
  • PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce a novel data structure, ESBF (Ex- tensible and Scalable Bloom Filter), and the algorithm FI-ESBF (Finding frequent Items using ESBF) for estimating the frequent items in data streams. FI-ESBF can work with high precision while using much less memory than those of the best reported algorithm does considering the large number of distinct items in the stream. ESBF is the extension of counting Bloom Filter(CBF), By using it, we are allowed to adjust the size of memory used dynamically according to the different data distribution and the number of distinct items in the data streams, therefore the priori knowledge about the data distribution of the streams and the number of distinct elements to be stored is not required.