A universal-scheme approach to statistical databases containing homogeneous summary tables
ACM Transactions on Database Systems (TODS)
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Customized Answers to Summary Queries via Aggregate Views
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Using Datacube Aggregates for Approximate Querying and Deviation Detection
IEEE Transactions on Knowledge and Data Engineering
Local computation of answers to table queries on summary databases
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Efficient estimation of joint queries from multiple OLAP databases
ACM Transactions on Database Systems (TODS)
Improving estimation accuracy of aggregate queries on data cubes
Data & Knowledge Engineering
Hi-index | 0.00 |
In this paper, we investigate the problem of estimation of a target database from summary databases derived from a base data cube. We show that such estimates can be derived by choosing a primary database which uses a proxy database to estimate the results. This technique is common in statistics, but an important issue we are addressing is the accuracy of these estimates. Specifically, given multiple primary and multiple proxy databases, that share the same summary measure, the problem is how to select the primary and proxy databases that will generate the most accurate target database estimation possible. We propose an algorithmic approach for determining the steps to select or compute the source databases from multiple summary databases, which makes use of the principles of information entropy. We show that the source databases with the largest number of cells in common provide the more accurate estimates. We prove that this is consistent with maximizing the entropy. We provide some experimental results on the accuracy of the target database estimation in order to verify our results.