Filtering duplicate items over distributed data streams

  • Authors:
  • Tian Xia;Cheqing Jin;Xiaofang Zhou;Aoying Zhou

  • Affiliations:
  • Department of Computer Science and Engineering, Fudan University, Shanghai, P.R. China;Department of Computer Science and Engineering, Fudan University, Shanghai, P.R. China;School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia;Department of Computer Science and Engineering, Fudan University, Shanghai, P.R. China

  • Venue:
  • WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.