Intelligent data staging with overlapped execution of grid applications

Authors:
Yuya Machida;Shin'ichiro Takizawa;Hidemoto Nakada;Satoshi Matsuoka
Affiliations:
Tokyo Institute of Technology, 2-12-1 Ookayama, Tokyo, 152-8550, Japan;Tokyo Institute of Technology, 2-12-1 Ookayama, Tokyo, 152-8550, Japan;National Institute of Advanced Industrial Science and Technology, 1-1-1 Umezono, Tsukuba, 305-8568, Japan;Tokyo Institute of Technology, 2-12-1 Ookayama, Tokyo, 152-8550, Japan and National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Venue:
Future Generation Computer Systems
Year:
2008

Citing 13
Cited 4

A worldwide flock of Condors: load sharing among workstation clusters

Future Generation Computer Systems - Special issue: resource management in distributed systems
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Matchmaking: Distributed Resource Management for High Throughput Computing

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture for High-Energy Physics Applications

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Grid Datafarm Architecture for Petascale Data Intensive Computing

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Stork: Making Data Placement a First Class Citizen in the Grid

ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
A fully automated fault-tolerant system for distributed video processing and off-site replication

NOSSDAV '04 Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video
Performance and Scalability of a Replica Location Service

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
File-based replica management

Future Generation Computer Systems
A Scalable Multi-Replication Framework for Data Grid

SAINT-W '05 Proceedings of the 2005 Symposium on Applications and the Internet Workshops
Explicit control a batch-aware distributed file system

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4

Efficient reuse of replicated parallel data segments in computational grids

Future Generation Computer Systems
A multiple parallel download scheme with server throughput and client bandwidth considerations for data grids

Future Generation Computer Systems
A study on performance of dynamic file replication algorithms for real-time file access in Data Grids

Future Generation Computer Systems
A DSM-based fragmented data sharing framework for grids

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing data grid scheduling systems handle huge data I/O via replica location services coupled with simple staging and decoupled from scheduling of computing tasks. However, when the application/workflow scales, we observe considerable degradations in performance, compared to processing within a tightly-coupled cluster. For example, when numerous nodes access the same set of files simultaneously, extreme performance degradation occurs even if replicas are used, due to bottlenecks that show in the infrastructure. Instead of resorting to expensive solutions such as parallel file systems, we propose tightly coupling replica and data transfer management with computation scheduling for alleviating such situations. In particular, we propose three techniques: (1) data-staging requests aggregation and O(1) replication across multiple nodes using a multireplication framework, (2) replica-centric scheduling, which reuses previously used data for minimizing staging time and (3) overlapped execution of data staging and compute bound tasks. Early benchmark results implemented in our prototype Condor-like grid scheduling system demonstrate that the techniques are quite effective in eliminating much of the overhead in data transfers and achieving 100% of CPU utilization.