Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation

Authors:
Affiliations:
Venue:
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Year:
2002

Citing 0
Cited 27

Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
RHist: adaptive summarization over continuous data streams

Proceedings of the eleventh international conference on Information and knowledge management
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
A multi-dimensional histogram for selectivity estimation and fast approximate query answering

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Online Amnesic Approximation of Streaming Time Series

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Adaptive, unsupervised stream mining

The VLDB Journal — The International Journal on Very Large Data Bases
Fast range query estimation by N-level tree histograms

Data & Knowledge Engineering
estWin: Online data stream mining of recent frequent itemsets by sliding window method

Journal of Information Science
Fast and approximate stream mining of quantiles and frequencies using graphics processors

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient mining method for retrieving sequential patterns over online data streams

Journal of Information Science
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Framework for On-Demand Classification of Evolving Data Streams

IEEE Transactions on Knowledge and Data Engineering
Approximation and streaming algorithms for histogram construction problems

ACM Transactions on Database Systems (TODS)
Online outlier detection in sensor data using non-parametric models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Finding recently frequent itemsets adaptively over online transactional data streams

Information Systems
Adaptive, hands-off stream mining

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Estimating the output cardinality of partial preaggregation with a measure of clusteredness

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
REHIST: relative error histogram construction algorithms

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Enhancing histograms by tree-like bucket indices

The VLDB Journal — The International Journal on Very Large Data Bases
Wavelet synopsis for hierarchical range queries with workloads

The VLDB Journal — The International Journal on Very Large Data Bases
On the space---time of optimal, approximate and streaming algorithms for synopsis construction problems

The VLDB Journal — The International Journal on Very Large Data Bases
A new approach to building histogram for selectivity estimation in query processing optimization

Computers & Mathematics with Applications
Transformation of continuous aggregation join queries over data streams

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Fast Discovery of Group Lag Correlations in Streams

ACM Transactions on Knowledge Discovery from Data (TKDD)
DBOD-DS: distance based outlier detection for data

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Workload-optimal histograms on streams

ESA'05 Proceedings of the 13th annual European conference on Algorithms
Density estimation for spatial data streams

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Obtaining fast and good quality approximations to data distributions is a problem of central interest to database management. A variety of popular database applications including, approximate querying, similarity searching and data mining in most application domains, rely on such good quality approximations. Histogram based approximation is a very popular method in database theory and practice to succinctly represent a data distribution in a space efficient manner.In this paper, we place the problem of histogram construction into perspective and we generalize it by raising the requirement of a finite data set and/or known data set size. We consider the case of an infinite data set on which data arrive continuously forming an infinite data stream. In this context, we present the first single pass algorithms capable of constructing histograms of provable good quality. We present algorithms for the fixed window variant of the basic histogram construction problem, supporting incremental maintenance of the histograms. The proposed algorithms trade accuracy for speed and allow for a graceful tradeoff between the two, based on application requirements.In the case of approximate queries on infinite data streams, we present a detailed experimental evaluation comparing our algorithms with other applicable techniques using real data sets, demonstrating the superiority of our proposal.