Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Approximate medians and other quantiles in one pass and with limited memory
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing correlated aggregates over continual data streams
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Gigascope: high performance network monitoring with an SQL interface
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying and mining data streams: you only get one look a tutorial
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Tribeca: a system for managing large databases of network traffic
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Monitoring streams: a new class of data management applications
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Comparing data streams using Hamming norms (how to zero in)
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Data stream query processing: a tutorial
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Reversible sketches for efficient and accurate change detection over network data streams
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Effective Computation of Biased Quantiles over Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Sampling algorithms in a stream operator
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
A data stream language and system designed for power and extensibility
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Reversible sketches: enabling monitoring and analysis over high-speed data streams
IEEE/ACM Transactions on Networking (TON)
Finding hierarchical heavy hitters in streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Query-aware partitioning for monitoring massive network data streams
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Time-decaying aggregates in out-of-order streams
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Scaling issues in network monitoring
SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Prefilter: predicate pushdown at streaming speeds
SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Summarizing Two-Dimensional Data with Skyline-Based Statistical Descriptors
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Out-of-order processing: a new architecture for high-performance stream systems
Proceedings of the VLDB Endowment
Finding frequent items in data streams
Proceedings of the VLDB Endowment
Finding the frequent items in streams of data
Communications of the ACM - A View of Parallel Computing
Methods for finding frequent items in data streams
The VLDB Journal — The International Journal on Very Large Data Bases
HiFIND: A high-speed flow-level intrusion detection approach with DoS resiliency
Computer Networks: The International Journal of Computer and Telecommunications Networking
On concurrency control in sliding window queries over data streams
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Semantics of data streams and operators
ICDT'05 Proceedings of the 10th international conference on Database Theory
Streams, security and scalability
DBSec'05 Proceedings of the 19th annual IFIP WG 11.3 working conference on Data and Applications Security
International Journal of Sensor Networks
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Quantiles over data streams: an experimental study
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
Many algorithms have been proposed to approximate holistic aggregates, such as quantiles and heavy hitters, over data streams. However, little work has been done to explore what techniques are required to incorporate these algorithms in a data stream query processor, and to make them useful in practice.In this paper, we study the performance implications of using user-defined aggregate functions (UDAFs) to incorporate selection-based and sketch-based algorithms for holistic aggregates into a data stream management system's query processing architecture. We identify key performance bottlenecks and tradeoffs, and propose novel techniques to make these holistic UDAFs fast and space-efficient for use in high-speed data stream applications. We evaluate performance using generated and actual IP packet data, focusing on approximating quantiles and heavy hitters. The best of our current implementations can process streaming queries at OC48 speeds (2x 2.4Gbps).