Principles of distributed database systems
Principles of distributed database systems
Fundamentals of database systems (2nd ed.)
Fundamentals of database systems (2nd ed.)
Adaptive parallel aggregation algorithms
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
Daytona and the fourth-generation language Cymbal
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Parallel algorithms for the execution of relational database operations
ACM Transactions on Database Systems (TODS)
Deriving traffic demands for operational IP networks: methodology and experience
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Prototyping Bubba, A Highly Parallel Database System
IEEE Transactions on Knowledge and Data Engineering
Complex Aggregation at Multiple Granularities
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
The MD-join: An Operator for Complex OLAP
Proceedings of the 17th International Conference on Data Engineering
Fast Computation of Sparse Datacubes
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Generalized MD-Joins: Evaluation and Reduction to SQL
DBTel '01 Proceedings of the VLDB 2001 International Workshop on Databases in Telecommunications II
Ad Hoc OLAP: Expression and Evaluation
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Measurement and analysis of IP network usage and behavior
IEEE Communications Magazine
Efficiently Processing Query-Intensive Databases over a Non-Dedicated Local Network
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An algebraic framework for temporal attribute characteristics
Annals of Mathematics and Artificial Intelligence
Partitioned optimization of complex queries
Information Systems
Using grouping variables to express complex decision support queries
Data & Knowledge Engineering
ASSET queries: a declarative alternative to MapReduce
ACM SIGMOD Record
Multi-dimensional aggregation for temporal data
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Hierarchical aggregation in networked data management
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network requires collecting and analyzing network data, such as flow-level traffic statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of efficient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specified as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. Salient properties of our approach are that only partial results are shipped - never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization traffic and the local processing done at each site. We finally present an experimental study based on TPC(R) data. Our results demonstrate the scalability of our techniques and quantify the performance benefits of the optimization techniques that have gone into the Skalla system.