A universal-scheme approach to statistical databases containing homogeneous summary tables
ACM Transactions on Database Systems (TODS)
Information synthesis in statistical databases
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
OLAP and statistical databases: similarities and differences
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Modeling Multidimensional Databases
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
A Foundation for Multi-dimensional Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Recovering Information from Summary Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Summarizability in OLAP and Statistical Data Bases
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Improving estimation accuracy of aggregate queries on data cubes
Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
Improving estimation accuracy of aggregate queries on data cubes
Data & Knowledge Engineering
Hi-index | 0.00 |
Given an OLAP query expressed over multiple source OLAP databases, we study the problem of estimating the resulting OLAP target database. The problem arises when it is not possible to derive the result from a single database. The method we use is linear indirect estimation, commonly used for statistical estimation. We examine two obvious computational methods for computing such a target database, called the full cross-product (F) and preaggregation (P) methods. We study the accuracy and computational cost of these methods. While the F method provides a more accurate estimate, it is more expensive computationally than P. Our contribution is in proposing a third, new method, called the partial preaggregation method (PP), which is significantly less expensive than F, but just as accurate. We prove formally that the PP method yields the same results as the F method, and provide analytical and experimental results on the accuracy and computational benefits of the PP method.