FIDS: Monitoring Frequent Items over Distributed Data Streams

  • Authors:
  • Robert Fuller;Mehmed Kantardzic

  • Affiliations:
  • Computer Engineering and Computer Science Department, University of Louisville, Louisville, KY 40292,;Computer Engineering and Computer Science Department, University of Louisville, Louisville, KY 40292,

  • Venue:
  • MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many applications require the discovery of items which have occur frequently within multiple distributed data streams. Past solutions for this problem either require a high degree of error tolerance or can only provide results periodically. In this paper we introduce a new algorithm designed for continuously tracking frequent items over distributed data streams providing either exact or approximate answers. We tested the efficiency of our method using two real-world data sets. The results indicated significant reduction in communication cost when compared to naïve approaches and an existing efficient algorithm called Top-K Monitoring. Since our method does not rely upon approximations to reduce communication overhead and is explicitly designed for tracking frequent items, our method also shows increased quality in its tracking results.