Community Training: Partitioning Schemes in Good Shape for Federated Data Grids

  • Authors:
  • Tobias Scholl;Richard Kuntschke;Angelika Reiser;Alfons Kemper

  • Affiliations:
  • -;-;-;-

  • Venue:
  • E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

In federated Data Grids, individual institutions share their data sets within a community to enable collaborative data analysis. Data access needs to be provided in a scalable fashion since in most e-science communities, data sets do not only grow exponentially but also experience an increasing popularity. If data autonomy is retained, each individual institution has to ensure efficient access to its data. Analyzing application-specific data properties (such as data skew) or query characteristics (query patterns) and distributing data within Data Grids accordingly, allows for improved throughput for data-intensive applications and enables better load-balancing between shared resources. We propose a framework for investigating application-specific index structures for creating suitable partitioning schemes. We evaluate two variants of the well-known Quadtree data structure as well as the Zones approach, an index structure from the astrophysics domain, according to several criteria. Our framework improves data access within federated Data Grids and can be combined with well-established Grid methods as well as with more flexible P2P technologies.