Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
A scalable location service for geographic ad hoc routing
MobiCom '00 Proceedings of the 6th annual international conference on Mobile computing and networking
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Handbook of Applied Cryptography
Handbook of Applied Cryptography
Informed content delivery across adaptive overlay networks
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Efficient URL caching for world wide web crawling
WWW '03 Proceedings of the 12th international conference on World Wide Web
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Online Amnesic Approximation of Streaming Time Series
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
An optimal Bloom filter replacement
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Fast hash table lookup using extended bloom filter: an aid to network processing
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Time-Decaying Bloom Filters for Data Streams with Skewed Distributions
RIDE '05 Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Self-organization in peer-to-peer systems
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Approximately detecting duplicates for streaming data using stable bloom filters
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Beyond bloom filters: from approximate membership checks to approximate state machines
Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Information Processing Letters
Less hashing, same performance: building a better bloom filter
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
An improved construction for counting bloom filters
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Combating click fraud via premium clicks
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Detecting Click Fraud in Pay-Per-Click Streams of Online Advertising Networks
ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
Introduction to the Special Issue: Click Fraud
International Journal of Electronic Commerce
Efficient peer-to-peer keyword searching
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Improved approximate detection of duplicates for data streams over sliding windows
Journal of Computer Science and Technology
Dynamically Maintaining Duplicate-Insensitive and Time-Decayed Sum Using Time-Decaying Bloom Filter
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Maintaining time-decaying stream aggregates
Journal of Algorithms
Fighting online click-fraud using bluff ads
ACM SIGCOMM Computer Communication Review
Time-decaying bloom filters for efficient middle-tier data management
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement
IEEE Journal on Selected Areas in Communications
Hi-index | 0.00 |
Detecting duplicates in click data streams is an important task to fight against click fraud, which is the act of generating false clicks in internet advertising. Revenue generation advertising models, that charge advertisers for each click, leave space for individuals or rival companies to generate false clicks. The extent of click fraud's damage to online advertising has grown tremendously over the years. In this paper, we consider the problem of detecting duplicates in click data streams. Our solution uses a modified version of the counting Bloom filter. The temporal stateful Bloom filter (TSBF) extends the standard counting Bloom filter by replacing the bit-vector with an array of counters of states. These counters are dynamic and decay with time. We conducted a comprehensive set of experiments using synthetic and real world data. Results are compared with buffering techniques used in NetMosaics, a click fraud detection and prevention solution. Our results show that TSBF approach achieves 99% accuracy on duplicate detection, while keeping its space requirement a constant.