Processing complex aggregate queries over data streams

Authors:
Alin Dobra;Minos Garofalakis;Johannes Gehrke;Rajeev Rastogi
Affiliations:
Cornell University;Bell Labs, Lucent;Cornell University;Bell Labs, Lucent
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 22
Cited 113

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Random sampling techniques for space efficient online computation of order statistics of large datasets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing correlated aggregates over continual data streams

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Continuous queries over data streams

ACM SIGMOD Record
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Approximate L1-Difference Algorithm for Massive Data Streams

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science

Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
RHist: adaptive summarization over continuous data streams

Proceedings of the eleventh international conference on Information and knowledge management
Correlating XML data streams using tree-edit distance embeddings

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
One-Pass Wavelet Decompositions of Data Streams

IEEE Transactions on Knowledge and Data Engineering
Efficient Approximation of Correlated Sums on Data Streams

IEEE Transactions on Knowledge and Data Engineering
Issues in data stream management

ACM SIGMOD Record
Containment join size estimation: models and methods

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Processing set expressions over continuous update streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient decision tree construction on streaming data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Recent progress on selected topics in database research: a report by nine young Chinese researchers working in the United States

Journal of Computer Science and Technology
Characterizing memory requirements for queries over continuous data streams

ACM Transactions on Database Systems (TODS)
SQLCM: A Continuous Monitoring Framework for Relational Database Engines

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Detection of complex temporal patterns over data streams

Information Systems - Special issue: ADBIS 2002: Advances in databases and information systems
Holistic UDAFs at streaming speeds

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Online maintenance of very large random samples

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximation techniques for spatial data

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Exploiting k-constraints to reduce memory overhead in continuous queries over data streams

ACM Transactions on Database Systems (TODS)
Adaptive, unsupervised stream mining

The VLDB Journal — The International Journal on Very Large Data Bases
Finding hot query patterns over an XQuery stream

The VLDB Journal — The International Journal on Very Large Data Bases
Tracking set-expression cardinalities over continuous update streams

The VLDB Journal — The International Journal on Very Large Data Bases
Semantic Approximation of Data Stream Joins

IEEE Transactions on Knowledge and Data Engineering
Spatiotemporal Aggregate Computation: A Survey

IEEE Transactions on Knowledge and Data Engineering
Synopses for query optimization: a space-complexity perspective

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Power-conserving computation of order-statistics over sensor networks

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Containment of aggregate queries

ACM SIGMOD Record
XML stream processing using tree-edit distance embeddings

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Histograms revisited: when are histograms the best approximation method for aggregates over joins?

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Join-distinct aggregate estimation over update streams

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
BRAID: stream mining through group lag correlations

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Quality-driven evaluation of trigger conditions on streaming time series

Proceedings of the 2005 ACM symposium on Applied computing
Domain-Driven Data Synopses for Dynamic Quantiles

IEEE Transactions on Knowledge and Data Engineering
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Sketching streams through the net: distributed approximate query tracking

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Streaming pattern discovery in multiple time-series

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Sense & response service architecture (SARESA): an approach towards a real-time business intelligence solution and its use for a fraud detection application

Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Selectivity-based partitioning: a divide-and-union paradigm for effective query optimization

Proceedings of the 14th ACM international conference on Information and knowledge management
Synopses for query optimization: A space-complexity perspective

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams

IEEE Transactions on Knowledge and Data Engineering
Scalable computation of acyclic joins

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast range-summable random variables for efficient aggregate estimation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Graph-based synopses for relational selectivity estimation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The CQL continuous query language: semantic foundations and query execution

The VLDB Journal — The International Journal on Very Large Data Bases
Online summarization of dynamic time series data

The VLDB Journal — The International Journal on Very Large Data Bases
Load shedding in stream databases: a control-based approach

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Window-aware load shedding for aggregation queries over data streams

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Classification spanning correlated data streams

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
The Sort-Merge-Shrink join

ACM Transactions on Database Systems (TODS)
Deciding equivalences among conjunctive aggregate queries

Journal of the ACM (JACM)
Spatio-temporal join selectivity

Information Systems
Security and privacy for multimedia database management systems

Multimedia Tools and Applications
Error minimization in approximate range aggregates

Data & Knowledge Engineering
Pseudo-random number generation for sketch-based estimations

ACM Transactions on Database Systems (TODS)
Sketching probabilistic data streams

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Condensative stream query language for data streams

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Answering ad hoc aggregate queries from data streams using prefix aggregate trees

Knowledge and Information Systems
A transducer-based XML query processor

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A regression-based temporal pattern mining scheme for data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Tuple routing strategies for distributed eddies

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Adaptive, hands-off stream mining

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Estimating the output cardinality of partial preaggregation with a measure of clusteredness

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Memory-limited execution of windowed stream joins

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Boolean representation based data-adaptive correlation analysis over time series streams

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Randomized algorithms for data reconciliation in wide area aggregate query processing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
ZELESSA: an enabler for real-time sensing, analysing and acting on continuous event streams

International Journal of Business Intelligence and Data Mining
Approximate continuous querying over distributed streams

ACM Transactions on Database Systems (TODS)
Dynamic adaptive data structures for monitoring data streams

Data & Knowledge Engineering
Sketches for size of join estimation

ACM Transactions on Database Systems (TODS)
Confidence bounds for sampling-based group by estimates

ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file

The VLDB Journal — The International Journal on Very Large Data Bases
A relational model for XML structural joins and their size estimations

Knowledge and Information Systems
Event-Based Compression and Mining of Data Streams

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Summarizing Distributed Data Streams for Storage in Data Warehouses

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Experimenting the Query Performance of a Grid-Based Sensor Network Data Warehouse

Globe '08 Proceedings of the 1st international conference on Data Management in Grid and Peer-to-Peer Systems
Multi-query optimization for sketch-based estimation

Information Systems
Adaptive correlation analysis in stream time series with sliding windows

Computers & Mathematics with Applications
TuG synopses for approximate query answering

ACM Transactions on Database Systems (TODS)
Sampling-based estimators for subset-based queries

The VLDB Journal — The International Journal on Very Large Data Bases
Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
CAMS: OLAPing Multidimensional Data Streams Efficiently

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
mPlane: an architecture for scalable fault localization

Proceedings of the 2009 workshop on Re-architecting the internet
Location-dependent query processing: Where we are and where we are heading

ACM Computing Surveys (CSUR)
Transformation of continuous aggregation join queries over data streams

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Event-based lossy compression for effective and efficient OLAP over data streams

Data & Knowledge Engineering
Approximating sliding windows by cyclic tree-like histograms for efficient range queries

Data & Knowledge Engineering
Streaming multiple aggregations using phantoms

The VLDB Journal — The International Journal on Very Large Data Bases
Fast Discovery of Group Lag Correlations in Streams

ACM Transactions on Knowledge Discovery from Data (TKDD)
A robust approach for clock offset estimation in wireless sensor networks

EURASIP Journal on Advances in Signal Processing
Regression on evolving multi-relational data streams

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Load shedding for multi-way stream joins based on arrival order patterns

Journal of Intelligent Information Systems
gSketch: on query estimation in graph streams

Proceedings of the VLDB Endowment
Stream operators for querying data streams

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Personal information management (PIM) for intelligence analysis

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Fast approximate wavelet tracking on streams

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
On futuristic query processing in data streams

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Practical algorithms for tracking database join sizes

FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Estimating the overlapping area of polygon join

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
An efficient algorithm for frequent itemset mining on data streams

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Estimating aggregate join queries over data streams using discrete cosine transform

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Processing count queries over event streams at multiple time granularities

Information Sciences: an International Journal
Non-linear data stream compression: foundations and theoretical results

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Router support for fine-grained latency measurements

IEEE/ACM Transactions on Networking (TON)
Computing join aggregates over private tables

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Histograms as statistical estimators for aggregate queries

Information Systems
Enhanced stream processing in a DBMS kernel

Proceedings of the 16th International Conference on Extending Database Technology
Database support for processing complex aggregate queries over data streams

Proceedings of the Joint EDBT/ICDT 2013 Workshops
20 years of data quality research: themes, trends and synergies

ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
Overcoming memory limitations in high-throughput event-based applications

Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Optimised X-HYBRIDJOIN for near-real-time data warehousing

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Pattern discovery in data streams under the time warping distance

The VLDB Journal — The International Journal on Very Large Data Bases
Sketch-based geometric monitoring of distributed stream queries

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.In this paper, we consider the problem of approximately answering general aggregate SQL queries over continuous data streams with limited memory. Our method relies on randomizing techniques that compute small "sketch" summaries of the streams that can then be used to provide approximate answers to aggregate queries with provable guarantees on the approximation error. We also demonstrate how existing statistical information on the base data (e.g., histograms) can be used in the proposed framework to improve the quality of the approximation provided by our algorithms. The key idea is to intelligently partition the domain of the underlying attribute(s) and, thus, decompose the sketching problem in a way that provably tightens our guarantees. Results of our experimental study with real-life as well as synthetic data streams indicate that sketches provide significantly more accurate answers compared to histograms for aggregate queries. This is especially true when our domain partitioning methods are employed to further boast the accuracy of the final estimates.