Model and procedure for performance and availability-wise parallel warehouses

  • Authors:
  • Pedro Furtado

  • Affiliations:
  • University of Coimbra, Coimbra, Portugal

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Consider data warehouses as large data repositories queried for analysis and data mining in a variety of application contexts. A query over such data may take a large amount of time to be processed in a regular PC. Consider partitioning the data into a set of PCs (nodes), with either a parallel database server or any database server at each node and an engine-independent middleware. Nodes and network may even not be fully dedicated to the data warehouse. In such a scenario, care must be taken for handling processing heterogeneity and availability, so we study and propose efficient solutions for this. We concentrate on three main contributions: a performance-wise index, measuring relative performance; a replication-degree; a flexible chunk-wise organization with on-demand processing. These contributions extend the previous work on de-clustering and replication and are generic in the sense that they can be applied in very different contexts and with different data partitioning approaches. We evaluate their merits with a prototype implementation of the system.