Efficient OLAP query processing in distributed data warehouses

Authors:
Michael O. Akinde;Michael H. Böhlen;Theodore Johnson;Laks V. S. Lakshmanan;Divesh Srivastava
Affiliations:
MHO Data Warehouse Unit, Computer Science Department, (SMHI), Folkborgsvägen, Sweden and Department of Computer Science, Aalborg University, Aalborg, Denmark;Department of Computer Science, Aalborg University, Fredrik Bajers Vej 7E, DK-9220 Aalborg, Denmark;AT&T Labs-Research, P.O. Box 971, Florham Park, NJ;Department of Computer Science, The University of British Columbia, 2329 West Mall, Vancouver, B.C., Canada V6T 1Z4;AT&T Labs-Research, P.O. Box 971, Florham Park, NJ
Venue:
Information Systems - Special issue: Best papers from EDBT 2002
Year:
2003

Citing 22
Cited 12

Distributed databases principles and systems

Distributed databases principles and systems
Principles of distributed database systems

Principles of distributed database systems
Fundamentals of database systems (2nd ed.)

Fundamentals of database systems (2nd ed.)
Why decision support fails and how to fix it

ACM SIGMOD Record
Adaptive parallel aggregation algorithms

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Daytona and the fourth-generation language Cymbal

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Extending complex ad-hoc OLAP

Proceedings of the eighth international conference on Information and knowledge management
Parallel algorithms for the execution of relational database operations

ACM Transactions on Database Systems (TODS)
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Optimizing object queries using an effective calculus

ACM Transactions on Database Systems (TODS)
Deriving traffic demands for operational IP networks: methodology and experience

IEEE/ACM Transactions on Networking (TON)
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Prototyping Bubba, A Highly Parallel Database System

IEEE Transactions on Knowledge and Data Engineering
Complex Aggregation at Multiple Granularities

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
The MD-join: An Operator for Complex OLAP

Proceedings of the 17th International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Querying Multiple Features of Groups in Relational Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Generalized MD-Joins: Evaluation and Reduction to SQL

DBTel '01 Proceedings of the VLDB 2001 International Workshop on Databases in Telecommunications II
Ad Hoc OLAP: Expression and Evaluation

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Measurement and analysis of IP network usage and behavior

IEEE Communications Magazine

Using grouping variables to express complex decision support queries

Data & Knowledge Engineering
A Query Cache Tool for Optimizing Repeatable and Parallel OLAP Queries

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Distributed online aggregations

Proceedings of the VLDB Endowment
Efficient updates for a shared nothing analytics platform

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Brown dwarf: a P2P data-warehousing system

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
θ-Constrained multi-dimensional aggregation

Information Systems
Online querying of d-dimensional hierarchies

Journal of Parallel and Distributed Computing
Brown Dwarf: A fully-distributed, fault-tolerant data warehousing system

Journal of Parallel and Distributed Computing
A framework for building logical schema and query decomposition in data warehouse federations

ICCCI'11 Proceedings of the Third international conference on Computational collective intelligence: technologies and applications - Volume Part I
OLAP query reformulation in peer-to-peer data warehousing

Information Systems
Avatara: OLAP for web-scale analytics products

Proceedings of the VLDB Endowment
A formal framework for query decomposition and knowledge integration in data warehouse federations

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

The success of Internet applications has led to an explosive growth in the demand for bandwidth from Internet Service Providers. Managing an Internet protocol network requires collecting and analyzing network data, such as flow-level traffic statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of efficient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specified as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. A salient property of our approach is that only partial results are shipped-never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization traffic and the local processing done at each site. We finally present an experimental study based on TPC-R data. Our results demonstrate the scalability of our techniques and quantify the performance benefits of the optimization techniques that have gone into the Skalla system.