Using Regression Techniques to Predict Large Data Transfers

Authors:
Sudharshan Vazhkudai;Jennifer M. Schopf
Affiliations:
DEPARTMENT OF COMPUTER AND INFORMATION SCIENCE, THE UNIVERSITY OF MISSISSIPPI;MATHEMATICS AND COMPUTER SCIENCE DIVISION, ARGONNE NATIONAL LABORATORY
Venue:
International Journal of High Performance Computing Applications
Year:
2003

Citing 27
Cited 24

Analytic Queueing Network Models for Parallel Processing of Task Systems

IEEE Transactions on Computers
Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
Digital signal processing: theory, applications, and hardware

Digital signal processing: theory, applications, and hardware
Analytical performance prediction on multicomputers

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Performance prediction and tuning of parallel programs

Performance prediction and tuning of parallel programs
Exploiting process lifetime distributions for dynamic load balancing

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Adaptive performance prediction for distributed data-intensive applications

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance coupling: case studies for measuring the interactions of kernels in modern applications

Performance evaluation and benchmarking with realistic applications
High-performance remote access to climate simulation data: a challenge problem for data grid technologies

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Dynamically forecasting network performance using the Network Weather Service

Cluster Computing
Host load prediction using linear models

Cluster Computing
Predicting Performance of Parallel Computations

IEEE Transactions on Parallel and Distributed Systems
Predicting the Performance of Wide Area Data Transfers

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Predicting Queue Times on Space-Sharing Parallel Computers

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Predicting Application Run Times Using Historical Information

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The SDSC storage resource broker

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Replica Selection in the Globus Data Grid

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Customized dynamic load balancing for a network of workstations

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
A Distributed Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Simulation of Dynamic Data Replication Strategies in Data Grids

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Rules of Thumb in Data Engineering

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
The War Between Mice and Elephants

The War Between Mice and Elephants
On Class-based Isolation of UDP, Short-lived and Long-lived TCP Flows

On Class-based Isolation of UDP, Short-lived and Long-lived TCP Flows
Optimizing TCP Start-up Performance

Optimizing TCP Start-up Performance
Performance Prediction in Production Environments

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Data Replication Strategies in Grid Environments

ICA3PP '02 Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing
Time series models for internet traffic

INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 2

References

Grid resource management
Replica selection in grid environment: a data-mining approach

Proceedings of the 2005 ACM symposium on Applied computing
Implementation of a dynamic adjustment mechanism with efficient replica selection in data grid environments

Proceedings of the 2006 ACM symposium on Applied computing
Constructing collaborative desktop storage caches for large scientific datasets

ACM Transactions on Storage (TOS)
Improvements on dynamic adjustment mechanism in co-allocation data grid environments

The Journal of Supercomputing
Adaptive performance control for distributed scientific coupled models

Proceedings of the 21st annual international conference on Supercomputing
Data transfers in the grid: workload analysis of globus GridFTP

DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Replica selection strategies in data grid

Journal of Parallel and Distributed Computing
Enhancement of anticipative recursively adjusting mechanism for redundant parallel file transfer in data grids

Journal of Network and Computer Applications
A Recursively-Adjusting Co-allocation scheme with a Cyber-Transformer in Data Grids

Future Generation Computer Systems
An innovative perspective on mapping in grids

BADS '09 Proceedings of the 2009 workshop on Bio-inspired algorithms for distributed systems
Which network measurement tool is right for you? a multidimensional comparison study

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Redundant parallel file transfer with anticipative recursively-adjusting scheme in data grids

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
An adaptive multisite mapping for computationally intensive grid applications

Future Generation Computer Systems
A dynamic adjustment strategy for file transformation in data grids

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Implementation of a dynamic adjustment strategy for parallel file transfer in co-allocation data grids

The Journal of Supercomputing
A recursive-adjustment co-allocation scheme in data grid environments

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
A deadline and budget constrained scheduling algorithm for escience applications on data grids

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
An autonomic framework for enhancing the quality of data grid services

Future Generation Computer Systems
A two phased service oriented Broker for replica selection in data grids

Future Generation Computer Systems
Behavioral model for cloud aware load and power management

Proceedings of the 2013 international workshop on Hot topics in cloud services
An adaptive data transfer algorithm using block device reconfiguration in virtual MapReduce clusters

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
On causes of GridFTP transfer throughput variance

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
Active and accelerated learning of cost models for optimizing scientific applications

VLDB '06 Proceedings of the 32nd international conference on Very large data bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent proliferation of Data Grids and the increasingly common practice of using resources as distributed data stores provide a convenient environment for communities of researchers to share, replicate, and manage access to copies of large datasets. This has led to the question of which replica can be accessed most efficiently. In such environments, fetching data from one of the several replica locations requires accurate predictions of end-to-end transfer times. The answer to this question can depend on many factors, including physical characteristics of the resources and the load behavior on the CPUs, networks, and storage devices that are part of the end-to-end data path linking possible sources and sinks.Our approach combines end-to-end application throughput observations with network and disk load variations and captures whole-system performance and variations in load patterns. Our predictions characterize the effect of load variations of several shared devices (network and disk) on file transfer times. We develop a suite of univariate and multivariate predictors that can use multiple data sources to improve the accuracy of the predictions as well as address Data Grid variations (availability of data and sporadic nature of transfers). We ran a large set of data transfer experiments using GridFTP and observed performance predictions within 15% error for our testbed sites, which is quite promising for a pragmatic system.