Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
RHist: adaptive summarization over continuous data streams
Proceedings of the eleventh international conference on Information and knowledge management
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
A multi-dimensional histogram for selectivity estimation and fast approximate query answering
CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Online Amnesic Approximation of Streaming Time Series
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Adaptive, unsupervised stream mining
The VLDB Journal — The International Journal on Very Large Data Bases
Fast range query estimation by N-level tree histograms
Data & Knowledge Engineering
estWin: Online data stream mining of recent frequent itemsets by sliding window method
Journal of Information Science
Fast and approximate stream mining of quantiles and frequencies using graphics processors
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient mining method for retrieving sequential patterns over online data streams
Journal of Information Science
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Framework for On-Demand Classification of Evolving Data Streams
IEEE Transactions on Knowledge and Data Engineering
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Online outlier detection in sensor data using non-parametric models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Adaptive, hands-off stream mining
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Estimating the output cardinality of partial preaggregation with a measure of clusteredness
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
REHIST: relative error histogram construction algorithms
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Enhancing histograms by tree-like bucket indices
The VLDB Journal — The International Journal on Very Large Data Bases
Wavelet synopsis for hierarchical range queries with workloads
The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
A new approach to building histogram for selectivity estimation in query processing optimization
Computers & Mathematics with Applications
Transformation of continuous aggregation join queries over data streams
SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Fast Discovery of Group Lag Correlations in Streams
ACM Transactions on Knowledge Discovery from Data (TKDD)
DBOD-DS: distance based outlier detection for data
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Workload-optimal histograms on streams
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Density estimation for spatial data streams
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Hi-index | 0.00 |
Obtaining fast and good quality approximations to data distributions is a problem of central interest to database management. A variety of popular database applications including, approximate querying, similarity searching and data mining in most application domains, rely on such good quality approximations. Histogram based approximation is a very popular method in database theory and practice to succinctly represent a data distribution in a space efficient manner.In this paper, we place the problem of histogram construction into perspective and we generalize it by raising the requirement of a finite data set and/or known data set size. We consider the case of an infinite data set on which data arrive continuously forming an infinite data stream. In this context, we present the first single pass algorithms capable of constructing histograms of provable good quality. We present algorithms for the fixed window variant of the basic histogram construction problem, supporting incremental maintenance of the histograms. The proposed algorithms trade accuracy for speed and allow for a graceful tradeoff between the two, based on application requirements.In the case of approximate queries on infinite data streams, we present a detailed experimental evaluation comparing our algorithms with other applicable techniques using real data sets, demonstrating the superiority of our proposal.