Duplicate detection in pay-per-click streams using temporal stateful Bloom filters

  • Authors:
  • Chamila Walgampaya;Mehmed Kantardzic;Brent Wenerstrom

  • Affiliations:
  • Computer Science and Computer Engineering Department, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA.;Computer Science and Computer Engineering Department, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA.;Computer Science and Computer Engineering Department, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA

  • Venue:
  • International Journal of Data Analysis Techniques and Strategies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Detecting duplicates in click data streams is an important task to fight against click fraud, which is the act of generating false clicks in internet advertising. Revenue generation advertising models, that charge advertisers for each click, leave space for individuals or rival companies to generate false clicks. The extent of click fraud's damage to online advertising has grown tremendously over the years. In this paper, we consider the problem of detecting duplicates in click data streams. Our solution uses a modified version of the counting Bloom filter. The temporal stateful Bloom filter (TSBF) extends the standard counting Bloom filter by replacing the bit-vector with an array of counters of states. These counters are dynamic and decay with time. We conducted a comprehensive set of experiments using synthetic and real world data. Results are compared with buffering techniques used in NetMosaics, a click fraud detection and prevention solution. Our results show that TSBF approach achieves 99% accuracy on duplicate detection, while keeping its space requirement a constant.