On futuristic query processing in data streams

Authors:
Charu C. Aggarwal
Affiliations:
IBM T.J. Watson Research Center, Hawthorne, NY
Venue:
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Year:
2006

Citing 14
Cited 0

Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Hancock: a language for extracting signatures from data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advances in hardware technology have resulted in the ability to collect and process large amounts of data. In many cases, the collection of the data is a continuous process over time. Such continuous collections of data are referred to as data streams. One of the interesting problems in data stream mining is that of predictive query processing. This is useful for a variety of data mining applications which require us to estimate the future behavior of the data stream. In this paper, we will discuss the problem from the point of view of predictive summarization. In predictive summarization, we would like to store statistical characteristics of the data stream which are useful for estimation of queries representing the behavior of the stream in the future. The example utilized for this paper is the case of selectivity estimation of range queries. For this purpose, we propose a technique which utilizes a local predictive approach in conjunction with a careful choice of storing and summarizing particular statistical characteristics of the data. We use this summarization technique to estimate the future selectivity of range queries, though the results can be utilized to estimate a variety of futuristic queries. We test the results on a variety of data sets and illustrate the effectiveness of the approach.