Maintaining stream statistics over multiscale sliding windows

Authors:
Yishan Jiao
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences and Graduate School, Chinese Academy of Sciences, Zhongguancun, Beijing, P.R.China
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2006

Citing 23
Cited 1

Introduction to algorithms

Introduction to algorithms
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
On the Average Number of Maxima in a Set of Vectors and Applications

Journal of the ACM (JACM)
A unifying look at data structures

Communications of the ACM
Estimating simple functions on the union of data streams

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Distributed streams algorithms for sliding windows

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Maintaining Stream Statistics over Sliding Windows

SIAM Journal on Computing
Finding Interesting Associations without Support Pruning

IEEE Transactions on Knowledge and Data Engineering
Data streams: algorithms and applications

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Counting Distinct Elements in a Data Stream

RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Estimating Rarity and Similarity over Data Stream Windows

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Maintaining time-decaying stream aggregates

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Hourly analysis of a very large topically categorized web query log

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Effective Computation of Biased Quantiles over Data Streams

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Approximate counts and quantiles over sliding windows

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Space- and time-efficient deterministic algorithms for biased quantiles over data streams

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Resource sharing in continuous sliding-window aggregates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

CUDA-Enabled Optimisation of Technical Analysis Parameters

DS-RT '12 Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we propose a new multiscale sliding window model which differentiates data items in different time periods of the data stream, based on a reasonable monotonicity of resolution assumption. Our model, as a well-motivated extension of the sliding window model, stands halfway between the traditional all-history and time-decaying models. We also present algorithms for estimating two significant data stream statistics---F0 and Jacard's similarity coefficient---with reasonable accuracies under the new model. Our algorithms use space logarithmic in the data stream size and linear in the number of windows; they support update time logarithmic in the number of windows and independent of the accuracy required. Our algorithms are easy to implement. Experimental results demonstrate the efficiencies of our algorithms. Our techniques apply to scenarios in which universe sampling is used.