Efficient estimation of joint queries from multiple OLAP databases

Authors:
Elaheh Pourabbas;Arie Shoshani
Affiliations:
National Research Council, Rome, Italy;Lawrence Berkeley National Laboratory, Berkeley, CA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2007

Citing 8
Cited 2

A universal-scheme approach to statistical databases containing homogeneous summary tables

ACM Transactions on Database Systems (TODS)
Information synthesis in statistical databases

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
OLAP and statistical databases: similarities and differences

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Modeling Multidimensional Databases

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
A Foundation for Multi-dimensional Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Recovering Information from Summary Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Summarizability in OLAP and Statistical Data Bases

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management

Improving estimation accuracy of aggregate queries on data cubes

Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
Improving estimation accuracy of aggregate queries on data cubes

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given an OLAP query expressed over multiple source OLAP databases, we study the problem of estimating the resulting OLAP target database. The problem arises when it is not possible to derive the result from a single database. The method we use is linear indirect estimation, commonly used for statistical estimation. We examine two obvious computational methods for computing such a target database, called the full cross-product (F) and preaggregation (P) methods. We study the accuracy and computational cost of these methods. While the F method provides a more accurate estimate, it is more expensive computationally than P. Our contribution is in proposing a third, new method, called the partial preaggregation method (PP), which is significantly less expensive than F, but just as accurate. We prove formally that the PP method yields the same results as the F method, and provide analytical and experimental results on the accuracy and computational benefits of the PP method.