Efficient OLAP Query Processing in Distributed Data Warehouses

Authors:
Michael O. Akinde;Michael H. Böhlen;Theodore Johnson;Laks V. S. Lakshmanan;Divesh Srivastava
Affiliations:
-;-;-;-;-
Venue:
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2002

Citing 17
Cited 7

Principles of distributed database systems

Principles of distributed database systems
Fundamentals of database systems (2nd ed.)

Fundamentals of database systems (2nd ed.)
Adaptive parallel aggregation algorithms

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Daytona and the fourth-generation language Cymbal

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Parallel algorithms for the execution of relational database operations

ACM Transactions on Database Systems (TODS)
Deriving traffic demands for operational IP networks: methodology and experience

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Prototyping Bubba, A Highly Parallel Database System

IEEE Transactions on Knowledge and Data Engineering
Complex Aggregation at Multiple Granularities

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
The MD-join: An Operator for Complex OLAP

Proceedings of the 17th International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Generalized MD-Joins: Evaluation and Reduction to SQL

DBTel '01 Proceedings of the VLDB 2001 International Workshop on Databases in Telecommunications II
Ad Hoc OLAP: Expression and Evaluation

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Measurement and analysis of IP network usage and behavior

IEEE Communications Magazine

Efficiently Processing Query-Intensive Databases over a Non-Dedicated Local Network

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An algebraic framework for temporal attribute characteristics

Annals of Mathematics and Artificial Intelligence
Partitioned optimization of complex queries

Information Systems
Using grouping variables to express complex decision support queries

Data & Knowledge Engineering
ASSET queries: a declarative alternative to MapReduce

ACM SIGMOD Record
Multi-dimensional aggregation for temporal data

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Hierarchical aggregation in networked data management

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network requires collecting and analyzing network data, such as flow-level traffic statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of efficient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specified as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. Salient properties of our approach are that only partial results are shipped - never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization traffic and the local processing done at each site. We finally present an experimental study based on TPC(R) data. Our results demonstrate the scalability of our techniques and quantify the performance benefits of the optimization techniques that have gone into the Skalla system.