Model and procedure for performance and availability-wise parallel warehouses

Authors:
Pedro Furtado
Affiliations:
University of Coimbra, Coimbra, Portugal
Venue:
Distributed and Parallel Databases
Year:
2009

Citing 21
Cited 1

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
An adaptive data placement scheme for parallel database computer systems

Proceedings of the sixteenth international conference on Very large databases
Parallel database systems: the future of high performance database systems

Communications of the ACM
Multi-table joins through bitmapped join indices

ACM SIGMOD Record
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Materialized views and data warehouses

ACM SIGMOD Record
Principles of database query processing for advanced applications

Principles of database query processing for advanced applications
Bitmap index design and evaluation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A performance study of three high availability data replication strategies

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Orthogonal Striping and Mirroring in Distributed RAID for I/O-Centric Cluster Computing

IEEE Transactions on Parallel and Distributed Systems
Automating physical database design in a parallel database

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Distributed RAID - A New Multiple Copy Algorithm

Proceedings of the Sixth International Conference on Data Engineering
Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines

Proceedings of the Sixth International Conference on Data Engineering
Tandem Database Group - NonStop SQL: A Distributed, High-Performance, High-Availability Implementation of SQL

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Multi-Dimensional Database Allocation for Parallel Data Warehouses

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
OLAP Query Evaluation in a Database Cluster: A Performance Study on Intra-Query Parallelism

ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
Experimental evidence on partitioning in parallel data warehouses

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
Efficiently Processing Query-Intensive Databases over a Non-Dedicated Local Network

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Data warehouses in grids with high qos

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
An evolutionary approach to schema partitioning selection in a data warehouse

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

A highly reliable and parallelizable data distribution scheme for data grids

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consider data warehouses as large data repositories queried for analysis and data mining in a variety of application contexts. A query over such data may take a large amount of time to be processed in a regular PC. Consider partitioning the data into a set of PCs (nodes), with either a parallel database server or any database server at each node and an engine-independent middleware. Nodes and network may even not be fully dedicated to the data warehouse. In such a scenario, care must be taken for handling processing heterogeneity and availability, so we study and propose efficient solutions for this. We concentrate on three main contributions: a performance-wise index, measuring relative performance; a replication-degree; a flexible chunk-wise organization with on-demand processing. These contributions extend the previous work on de-clustering and replication and are generic in the sense that they can be applied in very different contexts and with different data partitioning approaches. We evaluate their merits with a prototype implementation of the system.