Community Training: Partitioning Schemes in Good Shape for Federated Data Grids

Authors:
Tobias Scholl;Richard Kuntschke;Angelika Reiser;Alfons Kemper
Affiliations:
-;-;-;-
Venue:
E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Year:
2007

Citing 0
Cited 5

Community-driven data grids

Proceedings of the VLDB Endowment
Scalable community-driven data sharing in e-science grids

Future Generation Computer Systems
Workload-aware data partitioning in community-driven data grids

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Collaborative query coordination in community-driven data grids

Proceedings of the 18th ACM international symposium on High performance distributed computing
The data partition strategy based on hybrid range consistent hash in NoSQL database

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service

Quantified Score

Hi-index	0.01

Visualization

Abstract

In federated Data Grids, individual institutions share their data sets within a community to enable collaborative data analysis. Data access needs to be provided in a scalable fashion since in most e-science communities, data sets do not only grow exponentially but also experience an increasing popularity. If data autonomy is retained, each individual institution has to ensure efficient access to its data. Analyzing application-specific data properties (such as data skew) or query characteristics (query patterns) and distributing data within Data Grids accordingly, allows for improved throughput for data-intensive applications and enables better load-balancing between shared resources. We propose a framework for investigating application-specific index structures for creating suitable partitioning schemes. We evaluate two variants of the well-known Quadtree data structure as well as the Zones approach, an index structure from the astrophysics domain, according to several criteria. Our framework improves data access within federated Data Grids and can be combined with well-established Grid methods as well as with more flexible P2P technologies.