Workload-aware data partitioning in community-driven data grids

Authors:
Tobias Scholl;Bernhard Bauer;Jessica Müller;Benjamin Gufler;Angelika Reiser;Alfons Kemper
Affiliations:
Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany;Technische Universität München, Munich, Germany
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 22
Cited 1

Hierarchical representations of collections of small rectangles

ACM Computing Surveys (CSUR)
The design and analysis of spatial data structures

The design and analysis of spatial data structures
Parallel database systems: the future of high performance database systems

Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Parallel Database Techniques

Parallel Database Techniques
The SDSS skyserver: public access to the sloan digital sky server data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A class of data structures for associative searching

PODS '84 Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Zone sharing: a hot-spots decomposition scheme for data-centric storage in sensor networks

DMSN '05 Proceedings of the 2nd international workshop on Data management for sensor networks
Batch is Back: CasJobs, Serving Multi-TB Data on the Web

ICWS '05 Proceedings of the IEEE International Conference on Web Services
A taxonomy of Data Grids for distributed data sharing, management, and processing

ACM Computing Surveys (CSUR)
A Distributed Geotechnical Information Management and Exchange Architecture

IEEE Internet Computing
KDDCS: a load-balanced in-network data-centric storage scheme for sensor networks

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Grid-Based Data Stream Processing in e-Science

E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
P-ring: an efficient and robust P2P range index structure

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Online balancing of range-partitioned data with applications to peer-to-peer systems

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
HiSbase: histogram-based P2P main memory data management

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Community Training: Partitioning Schemes in Good Shape for Federated Data Grids

E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Scalable community-driven data sharing in e-science grids

Future Generation Computer Systems
Network-Aware Join Processing in Global-Scale Database Federations

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Replication, load balancing and efficient range query processing in DHTs

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

Collaborative query coordination in community-driven data grids

Proceedings of the 18th ACM international symposium on High performance distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collaborative research in various scientific disciplines requires support for scalable data management enabling the efficient correlation of globally distributed data sources. Motivated by the expected data rates of upcoming projects and a growing number of users, communities explore new data management techniques for achieving high throughput. Community-driven data grids deliver such high-throughput data distribution for scientific federations by partitioning data according to application-specific data and query characteristics. Query hot spots are an important and challenging problem in this environment. Existing approaches to load-balancing from Peer-to-Peer (P2P) data management and sensor networks do not directly meet the requirements of a data-intensive e-science environment. In this paper, our contributions are partitioning schemes based on multi-dimensional index structures enabling communities to trade off data load balancing and handling query hot spots via splitting and replication. We evaluate the partitioning schemes with two typical kinds of data sets from the astrophysics domain and workloads extracted from Sloan Digital Sky Survey (SDSS) query traces and perform throughput measurements in real and simulated networks. The experiments demonstrate the improved workload distribution capabilities and give promising directions for the development of future community grids.