Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dynamic generation of discrete random variates
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Approximate data structures with applications
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Large-Sample and Deterministic Confidence Intervals for Online Aggregation
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
DSQoS-distributed architecture providing QoS in summary warehouses
DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
Estimating the overlapping area of polygon join
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Hi-index | 0.00 |
In large data warehouses it is possible to return very fast approximate answers to user queries using pre-computed sampling summaries well-fit for all types of exploration analysis. However, their usage is constrained by the fact that there must be a representative number of samples in grouping intervals to yield acceptable accuracy. In this paper we propose and evaluate a technique that deals with the representation issue by using time interval-biased stratified samples (TISS). The technique is able to deliver fast accurate analysis to the user by taking advantage of the importance of the time dimension in most user analysis. It is designed as a transparent middle layer, which analyzes and rewrites the query to use a summary instead of the base data warehouse. The estimations and error bounds returned using the technique are compared to those of traditional sampling summaries, to show that it achieves significant improvement in accuracy.