The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Maintaining time-decaying stream aggregates
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining variance and k-medians over data stream windows
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Exploiting Punctuation Semantics in Continuous Data Streams
IEEE Transactions on Knowledge and Data Engineering
Aurora: a data stream management system
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Medians and beyond: new aggregation techniques for sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space efficient mining of multigraph streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Supporting sliding window queries for continuous data streams
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Space- and time-efficient deterministic algorithms for biased quantiles over data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
User-defined aggregate functions: bridging theory and practice
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sketching asynchronous streams over a sliding window
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Smooth Histograms for Sliding Windows
FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Exponentially Decayed Aggregates on Data Streams
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A deterministic algorithm for summarizing asynchronous streams over a sliding window
STACS'07 Proceedings of the 24th annual conference on Theoretical aspects of computer science
Improved algorithms for polynomial-time decay and time-decay with additive error
ICTCS'05 Proceedings of the 9th Italian conference on Theoretical Computer Science
Adaptive spatial partitioning for multidimensional data streams
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
The Frequent Items Problem, under Polynomial Decay, in the Streaming Model
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Evaluating top-k queries over incomplete data streams
Proceedings of the 18th ACM conference on Information and knowledge management
Fast Manhattan sketches in data streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The frequent items problem, under polynomial decay, in the streaming model
Theoretical Computer Science
Approximating sliding windows by cyclic tree-like histograms for efficient range queries
Data & Knowledge Engineering
Fast Discovery of Group Lag Correlations in Streams
ACM Transactions on Knowledge Discovery from Data (TKDD)
Approximating frequent items in asynchronous data stream over a sliding window
WAOA'09 Proceedings of the 7th international conference on Approximation and Online Algorithms
Edit distance to monotonicity in sliding windows
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Pattern discovery in data streams under the time warping distance
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Processing large data streams is now a major topic in data management. The data involved can be truly massive, and the required analyses complex. In a stream of sequential events such as stock feeds, sensor readings, or IP traffic measurements, data tuples pertaining to recent events are typically more important than older ones. This can be formalized via time-decay functions, which assign weights to data based on the age of data. Decay functions such as sliding windows and exponential decay have been studied under the assumption of well-ordered arrivals, i.e., data arrives in non-decreasing order of time stamps. However, data quality issues are prevalent in massive streams (due to network asynchrony and delays etc.), and correct arrival order is not guaranteed. We focus on the computation of decayed aggregates such as range queries, quantiles, and heavy hitters on out-of-order streams, where elements do not necessarily arrive in increasing order of timestamps. Existing techniques such as Exponential Histograms and Waves are unable to handle out-of-order streams. We give the first deterministic algorithms for approximating these aggregates under popular decay functions such as sliding window and polynomial decay. We study the overhead of allowing out-of-order arrivals when compared to well-ordered arrivals, both analytically and experimentally. Our experiments confirm that these algorithms can be applied in practice, and compare the relative performance of different approaches for handling out-of-order arrivals.