Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Analysis of Hoare's FIND algorithm with median-of-three partition
Random Structures & Algorithms - Special issue: average-case analysis of algorithms
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Packet classification on multiple fields
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Hancock: a language for extracting signatures from data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Communications of the ACM
On computing correlated aggregates over continual data streams
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Histogramming Data Streams with Fast Per-Item Processing
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management
ACM SIGMOD Record
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Identifying frequent items in sliding windows over on-line packet streams
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
A Web page prediction model based on click-stream tree representation of user behavior
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Best position algorithms for top-k queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Finding frequent items in probabilistic data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SLEUTH: Single-pubLisher attack dEtection Using correlaTion Hunting
Proceedings of the VLDB Endowment
CAM conscious integrated answering of frequent elements and top-k queries over data streams
Proceedings of the 4th international workshop on Data management on new hardware
Efficient Single-Pass Mining of Weighted Interesting Patterns
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Frequent items in streaming data: An experimental evaluation of the state-of-the-art
Data & Knowledge Engineering
Handling Dynamic Weights in Weighted Frequent Pattern Mining
IEICE - Transactions on Information and Systems
Optimal tracking of distributed heavy hitters and quantiles
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Small synopses for group-by query verification on outsourced data streams
ACM Transactions on Database Systems (TODS)
Online Evaluation of Patterns from Evolving Web Data Streams
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Space-economical partial gram indices for exact substring matching
Proceedings of the 18th ACM conference on Information and knowledge management
Thread cooperation in multicore architectures for frequency counting over multiple data streams
Proceedings of the VLDB Endowment
Incorporating prediction models in the SelfLet framework: a plugin approach
Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools
ACM Transactions on Database Systems (TODS)
An online framework for catching top spreaders and scanners
Computer Networks: The International Journal of Computer and Telecommunications Networking
Optimal sampling from distributed streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Parallelizing weighted frequency counting in high-speed network monitoring
Computer Communications
Best position algorithms for efficient top-k query processing
Information Systems
Beyond simple aggregates: indexing for summary queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining hot calling contexts in small space
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Data-driven modeling and analysis of online social networks
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Lower bounds for number-in-hand multiparty communication complexity, made easy
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Continuous sampling from distributed streams
Journal of the ACM (JACM)
Single-pass incremental and interactive mining for weighted frequent patterns
Expert Systems with Applications: An International Journal
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Randomized algorithms for tracking distributed count, frequencies, and ranks
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Survey: Streaming techniques and data aggregation in networks of tiny artefacts
Computer Science Review
Interactive mining of high utility patterns over data streams
Expert Systems with Applications: An International Journal
Efficient frequent item counting in multi-core hardware
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Improved counter based algorithms for frequent pairs mining in transactional data streams
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Scalable identification and measurement of heavy-hitters
Computer Communications
High throughput heavy hitter aggregation for modern SIMD processors
Proceedings of the Ninth International Workshop on Data Management on New Hardware
ACM Transactions on Database Systems (TODS) - Invited papers issue
Indexing for summary queries: Theory and practice
ACM Transactions on Database Systems (TODS)
Identifying streaming frequent items in ad hoc time windows
Data & Knowledge Engineering
Accelerating frequent item counting with FPGA
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Hi-index | 0.00 |
We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and top-k elements with tight guarantees on errors. For general data distributions, our top-k algorithm returns k elements that have roughly the highest frequencies; and it uses limited space for calculating frequent elements. For realistic Zipfian data, the space requirement of the proposed algorithm for solving the exact frequent elements problem decreases dramatically with the parameter of the distribution; and for top-k queries, the analysis ensures that only the top-k elements, in the correct order, are reported. The experiments, using real and synthetic data sets, show space reductions with hardly any loss in accuracy. Having proved the effectiveness of the proposed approach through both analysis and experiments, we extend it to be able to answer continuous queries about frequent and top-k elements. Although the problems of incremental reporting of frequent and top-k elements are useful in many applications, to the best of our knowledge, no solution has been proposed.