Using Disk Throughput Data in Predictions of End-to-End Grid Data Transfers

Authors:
Sudharshan Vazhkudai;Jennifer M. Schopf
Affiliations:
-;-
Venue:
GRID '02 Proceedings of the Third International Workshop on Grid Computing
Year:
2002

Citing 22
Cited 11

Analytic Queueing Network Models for Parallel Processing of Task Systems

IEEE Transactions on Computers
Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
Analytical performance prediction on multicomputers

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Analyzing the behavior and performance of parallel programs

Analyzing the behavior and performance of parallel programs
Performance prediction and tuning of parallel programs

Performance prediction and tuning of parallel programs
Adaptive performance prediction for distributed data-intensive applications

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance coupling: case studies for measuring the interactions of kernels in modern applications

Performance evaluation and benchmarking with realistic applications
High-performance remote access to climate simulation data: a challenge problem for data grid technologies

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Dynamically forecasting network performance using the Network Weather Service

Cluster Computing
Predicting Performance of Parallel Computations

IEEE Transactions on Parallel and Distributed Systems
Data Management in an International Data Grid Project

GRID '00 Proceedings of the First IEEE/ACM International Workshop on Grid Computing
Predicting the Performance of Wide Area Data Transfers

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Predicting Queue Times on Space-Sharing Parallel Computers

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Predicting Application Run Times Using Historical Information

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Multivariate resource performance forecasting in the network weather service

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The SDSC storage resource broker

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
The Globus Project: A Status Report

HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Customized dynamic load balancing for a network of workstations

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
A Distributed Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Predicting Sporadic Grid Data Transfers

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Rules of Thumb in Data Engineering

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Performance Prediction in Production Environments

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

Replica selection in grid environment: a data-mining approach

Proceedings of the 2005 ACM symposium on Applied computing
Enabling Information Integration and Workflows in a Grid Environment with Automatic Wrapper Generation

GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Network flow based resource brokering and optimization techniques for distributed data streaming over optical networks

Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Dynamic load balancing for I/O-intensive applications on clusters

ACM Transactions on Storage (TOS)
Energy aware scheduling on desktop grid environment with static performance prediction

SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
Co-allocation in data grids: a global, multi-user perspective

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Models of dynamic relations among service activities, system state and service quality on computer and network systems

Information-Knowledge-Systems Management
Taming massive distributed datasets: data sampling using bitmap indices

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
On causes of GridFTP transfer throughput variance

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
Analysis and modeling of service impacts on system activities, resource workloads and service performance on computer and network systems

Information-Knowledge-Systems Management
System impact characteristics of cyber services, security mechanisms, and attacks with implications in cyber system survivability

Information-Knowledge-Systems Management

Quantified Score

Hi-index	0.01

Visualization

Abstract

Data grids provide an environment for communities of researchers to share, replicate, and manage access to copies of large datasets. In such environments, fetching data from one of the several replica locations requires accurate predictions of end-to-end transfer times. Predicting transfer time is significantly complicated because of the involvement of several shared components, including networks and disks in the end-to-end data path, each of which experiences load variations that can significantly affect the throughput. Of these, disk accesses are rapidly growing in cost and have not been previously considered, although on some machines they can be up to 30% of the transfer time. In this paper, we present techniques to combine observations of end-to-end application behavior and disk I/O throughput load data. We develop a set of regression models to derive predictions that characterize the effect of disk load variations on file transfer times. We also include network component variations and apply these techniques to the logs of transfer data using the GridFTP server, part of the Globus Toolkit驴. We observe up to 9% improvement in prediction accuracy when compared with approaches based on past system behavior in isolation.