Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing correlated aggregates over continual data streams
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Continuous queries over data streams
ACM SIGMOD Record
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Approximate L1-Difference Algorithm for Massive Data Streams
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
RHist: adaptive summarization over continuous data streams
Proceedings of the eleventh international conference on Information and knowledge management
Correlating XML data streams using tree-edit distance embeddings
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
Efficient Approximation of Correlated Sums on Data Streams
IEEE Transactions on Knowledge and Data Engineering
Issues in data stream management
ACM SIGMOD Record
Containment join size estimation: models and methods
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient decision tree construction on streaming data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Journal of Computer Science and Technology
Characterizing memory requirements for queries over continuous data streams
ACM Transactions on Database Systems (TODS)
SQLCM: A Continuous Monitoring Framework for Relational Database Engines
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Detection of complex temporal patterns over data streams
Information Systems - Special issue: ADBIS 2002: Advances in databases and information systems
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Online maintenance of very large random samples
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximation techniques for spatial data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Exploiting k-constraints to reduce memory overhead in continuous queries over data streams
ACM Transactions on Database Systems (TODS)
Adaptive, unsupervised stream mining
The VLDB Journal — The International Journal on Very Large Data Bases
Finding hot query patterns over an XQuery stream
The VLDB Journal — The International Journal on Very Large Data Bases
Tracking set-expression cardinalities over continuous update streams
The VLDB Journal — The International Journal on Very Large Data Bases
Semantic Approximation of Data Stream Joins
IEEE Transactions on Knowledge and Data Engineering
Spatiotemporal Aggregate Computation: A Survey
IEEE Transactions on Knowledge and Data Engineering
Synopses for query optimization: a space-complexity perspective
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Power-conserving computation of order-statistics over sensor networks
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Containment of aggregate queries
ACM SIGMOD Record
XML stream processing using tree-edit distance embeddings
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Histograms revisited: when are histograms the best approximation method for aggregates over joins?
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Join-distinct aggregate estimation over update streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
BRAID: stream mining through group lag correlations
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Quality-driven evaluation of trigger conditions on streaming time series
Proceedings of the 2005 ACM symposium on Applied computing
Domain-Driven Data Synopses for Dynamic Quantiles
IEEE Transactions on Knowledge and Data Engineering
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Sketching streams through the net: distributed approximate query tracking
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Streaming pattern discovery in multiple time-series
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Selectivity-based partitioning: a divide-and-union paradigm for effective query optimization
Proceedings of the 14th ACM international conference on Information and knowledge management
Synopses for query optimization: A space-complexity perspective
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams
IEEE Transactions on Knowledge and Data Engineering
Scalable computation of acyclic joins
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast range-summable random variables for efficient aggregate estimation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Graph-based synopses for relational selectivity estimation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The CQL continuous query language: semantic foundations and query execution
The VLDB Journal — The International Journal on Very Large Data Bases
Online summarization of dynamic time series data
The VLDB Journal — The International Journal on Very Large Data Bases
Load shedding in stream databases: a control-based approach
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Window-aware load shedding for aggregation queries over data streams
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Classification spanning correlated data streams
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
ACM Transactions on Database Systems (TODS)
Deciding equivalences among conjunctive aggregate queries
Journal of the ACM (JACM)
Spatio-temporal join selectivity
Information Systems
Security and privacy for multimedia database management systems
Multimedia Tools and Applications
Error minimization in approximate range aggregates
Data & Knowledge Engineering
Pseudo-random number generation for sketch-based estimations
ACM Transactions on Database Systems (TODS)
Sketching probabilistic data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Condensative stream query language for data streams
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Answering ad hoc aggregate queries from data streams using prefix aggregate trees
Knowledge and Information Systems
A transducer-based XML query processor
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A regression-based temporal pattern mining scheme for data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Tuple routing strategies for distributed eddies
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Adaptive, hands-off stream mining
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Estimating the output cardinality of partial preaggregation with a measure of clusteredness
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Memory-limited execution of windowed stream joins
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Boolean representation based data-adaptive correlation analysis over time series streams
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Randomized algorithms for data reconciliation in wide area aggregate query processing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
ZELESSA: an enabler for real-time sensing, analysing and acting on continuous event streams
International Journal of Business Intelligence and Data Mining
Approximate continuous querying over distributed streams
ACM Transactions on Database Systems (TODS)
Dynamic adaptive data structures for monitoring data streams
Data & Knowledge Engineering
Sketches for size of join estimation
ACM Transactions on Database Systems (TODS)
Confidence bounds for sampling-based group by estimates
ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file
The VLDB Journal — The International Journal on Very Large Data Bases
A relational model for XML structural joins and their size estimations
Knowledge and Information Systems
Event-Based Compression and Mining of Data Streams
KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Summarizing Distributed Data Streams for Storage in Data Warehouses
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Experimenting the Query Performance of a Grid-Based Sensor Network Data Warehouse
Globe '08 Proceedings of the 1st international conference on Data Management in Grid and Peer-to-Peer Systems
Multi-query optimization for sketch-based estimation
Information Systems
Adaptive correlation analysis in stream time series with sliding windows
Computers & Mathematics with Applications
TuG synopses for approximate query answering
ACM Transactions on Database Systems (TODS)
Sampling-based estimators for subset-based queries
The VLDB Journal — The International Journal on Very Large Data Bases
Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
CAMS: OLAPing Multidimensional Data Streams Efficiently
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
mPlane: an architecture for scalable fault localization
Proceedings of the 2009 workshop on Re-architecting the internet
Location-dependent query processing: Where we are and where we are heading
ACM Computing Surveys (CSUR)
Transformation of continuous aggregation join queries over data streams
SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Event-based lossy compression for effective and efficient OLAP over data streams
Data & Knowledge Engineering
Approximating sliding windows by cyclic tree-like histograms for efficient range queries
Data & Knowledge Engineering
Streaming multiple aggregations using phantoms
The VLDB Journal — The International Journal on Very Large Data Bases
Fast Discovery of Group Lag Correlations in Streams
ACM Transactions on Knowledge Discovery from Data (TKDD)
A robust approach for clock offset estimation in wireless sensor networks
EURASIP Journal on Advances in Signal Processing
Regression on evolving multi-relational data streams
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Load shedding for multi-way stream joins based on arrival order patterns
Journal of Intelligent Information Systems
gSketch: on query estimation in graph streams
Proceedings of the VLDB Endowment
Stream operators for querying data streams
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Personal information management (PIM) for intelligence analysis
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Fast approximate wavelet tracking on streams
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
On futuristic query processing in data streams
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Practical algorithms for tracking database join sizes
FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Estimating the overlapping area of polygon join
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
An efficient algorithm for frequent itemset mining on data streams
ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Estimating aggregate join queries over data streams using discrete cosine transform
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Processing count queries over event streams at multiple time granularities
Information Sciences: an International Journal
Non-linear data stream compression: foundations and theoretical results
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Router support for fine-grained latency measurements
IEEE/ACM Transactions on Networking (TON)
Computing join aggregates over private tables
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Histograms as statistical estimators for aggregate queries
Information Systems
Enhanced stream processing in a DBMS kernel
Proceedings of the 16th International Conference on Extending Database Technology
Database support for processing complex aggregate queries over data streams
Proceedings of the Joint EDBT/ICDT 2013 Workshops
20 years of data quality research: themes, trends and synergies
ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
Overcoming memory limitations in high-throughput event-based applications
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Optimised X-HYBRIDJOIN for near-real-time data warehousing
ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Pattern discovery in data streams under the time warping distance
The VLDB Journal — The International Journal on Very Large Data Bases
Sketch-based geometric monitoring of distributed stream queries
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed.In this paper, we consider the problem of approximately answering general aggregate SQL queries over continuous data streams with limited memory. Our method relies on randomizing techniques that compute small "sketch" summaries of the streams that can then be used to provide approximate answers to aggregate queries with provable guarantees on the approximation error. We also demonstrate how existing statistical information on the base data (e.g., histograms) can be used in the proposed framework to improve the quality of the approximation provided by our algorithms. The key idea is to intelligently partition the domain of the underlying attribute(s) and, thus, decompose the sketching problem in a way that provably tightens our guarantees. Results of our experimental study with real-life as well as synthetic data streams indicate that sketches provide significantly more accurate answers compared to histograms for aggregate queries. This is especially true when our domain partitioning methods are employed to further boast the accuracy of the final estimates.