Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On the security of pay-per-click and other Web advertising schemes
WWW '99 Proceedings of the eighth international conference on World Wide Web
Packet classification on multiple fields
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Implementing a relational database by means of specialzed hardware
ACM Transactions on Database Systems (TODS)
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Hancock: a language for extracting signatures from data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
On computing correlated aggregates over continual data streams
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Collection statistics for fast duplicate document detection
ACM Transactions on Information Systems (TOIS)
A knowledge-based approach for duplicate elimination in data cleaning
Information Systems - Data extraction, cleaning and reconciliation
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Filtering Duplicate Publications in Bibliographic Databases
NDDL '01 Proceedings of the 1st International Workshop on New Developments in Digital Libraries: n conjunction with ICEIS 2001
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
An Approach to Identify Duplicated Web Pages
COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Histogramming Data Streams with Fast Per-Item Processing
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management
ACM SIGMOD Record
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Models and Algorithms for Duplicate Document Detection
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Duplicate Detection for Symbolically Compressed Documents
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Identifying frequent items in sliding windows over on-line packet streams
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Web page prediction model based on click-stream tree representation of user behavior
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Online duplicate document detection: signature reliability in a dynamic retrieval environment
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Detecting hit shaving in click-through payment schemes
WOEC'98 Proceedings of the 3rd conference on USENIX Workshop on Electronic Commerce - Volume 3
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Using association rules for fraud detection in web advertising networks
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximately detecting duplicates for streaming data using stable bloom filters
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Detectives: detecting coalition hit inflation attacks in advertising networks streams
Proceedings of the 16th international conference on World Wide Web
Algorithms and incentives for robust ranking
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A network mitigation system against distributed denial of service: a linux-based prototype
IMSA'07 IASTED European Conference on Proceedings of the IASTED European Conference: internet and multimedia systems and applications
SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting
Proceedings of the VLDB Endowment
An Economic Model of Click Fraud in Publisher Networks
International Journal of Electronic Commerce
The Glitch in On-line Advertising: A Study of Click Fraud in Pay-Per-Click Advertising Programs
International Journal of Electronic Commerce
Finding duplicates in a data stream
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
An Economic Model of Click Fraud in Publisher Networks
International Journal of Electronic Commerce
The Glitch in On-line Advertising: A Study of Click Fraud in Pay-Per-Click Advertising Programs
International Journal of Electronic Commerce
Improved approximate detection of duplicates for data streams over sliding windows
Journal of Computer Science and Technology
Dynamically Maintaining Duplicate-Insensitive and Time-Decayed Sum Using Time-Decaying Bloom Filter
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
"Same, Same but Different" A Survey on Duplicate Detection Methods for Situation Awareness
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
A network mitigation system against distributed denial of service: a Linux-based prototype
EurolMSA '07 Proceedings of the Third IASTED European Conference on Internet and Multimedia Systems and Applications
Fast approximate duplicate detection for 2D-NMR spectra
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
An effective method for combating malicious scripts clickbots
ESORICS'09 Proceedings of the 14th European conference on Research in computer security
Cardinality estimation and dynamic length adaptation for Bloom filters
Distributed and Parallel Databases
A hybrid fraud scoring and spike detection technique in streaming data
Intelligent Data Analysis
The dark side of the Internet: Attacks, costs and responses
Information Systems
Tight bounds for Lp samplers, finding duplicates in streams, and related problems
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Spam or ham?: characterizing and detecting fraudulent "not spam" reports in web mail systems
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
One is enough: distributed filtering for duplicate elimination
Proceedings of the 20th ACM international conference on Information and knowledge management
Understanding fraudulent activities in online ad exchanges
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Cardinality computing: a new step towards fully representing multi-sets by bloom filters
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Detecting frauds in online advertising systems
EC-Web'06 Proceedings of the 7th international conference on E-Commerce and Web Technologies
Proceedings of the 15th International Conference on Extending Database Technology
An approximate duplicate elimination in RFID data streams
Data & Knowledge Engineering
Approximate membership query over time-decaying windows for event stream processing
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
Duplicate detection in pay-per-click streams using temporal stateful Bloom filters
International Journal of Data Analysis Techniques and Strategies
Memory efficient minimum substring partitioning
Proceedings of the VLDB Endowment
Data stream clustering: A survey
ACM Computing Surveys (CSUR)
Overview of turn data management platform for digital advertising
Proceedings of the VLDB Endowment
Streaming quotient filter: a near optimal approximate duplicate detection approach for data streams
Proceedings of the VLDB Endowment
Automatic optimization of stream programs via source program operator graph transformations
Distributed and Parallel Databases
TWINS: Efficient time-windowed in-network joins for sensor networks
Information Sciences: an International Journal
Hi-index | 0.00 |
We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solution based on Bloom Filters [9], and discuss the space and time requirements for running the proposed algorithm in both the contexts of sliding, and landmark stream windows. We run a comprehensive set of experiments, using both real and synthetic click streams, to evaluate the performance of the proposed solution. The results demonstrate that the proposed solution yields extremely low error rates.