One is enough: distributed filtering for duplicate elimination

  • Authors:
  • Georgia Koloniari;Nikos Ntarmos;Evaggelia Pitoura;Dimitris Souravlias

  • Affiliations:
  • University of Ioannina, Ioannina, Greece;University of Ioannina, Ioannina, Greece;University of Ioannina, Ioannina, Greece;University of Ioannina, Ioannina, Greece

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The growth of online services has created the need for duplicate elimination in high-volume streams of events. The sheer volume of data in applications such as pay-per-click clickstream processing, RSS feed syndication and notification services in social sites such Twitter and Facebook makes traditional centralized solutions hard to scale. In this paper, we propose an approach based on distributed filtering. To this end, we introduce a suite of distributed Bloom filters that exploit different ways of partitioning the event space. To address the continuous nature of event delivery, the filters are extended to support sliding window semantics. Moreover, we examine locality-related tradeoffs and propose a tree-based architecture to allow for duplicate elimination across geographic locations. We cast the design space and present experimental results that demonstrate the pros and cons of our various solutions in different settings.