Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications

Authors:
Kavitha Ranganathan;Ian Foster
Affiliations:
-;-
Venue:
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Year:
2002

Citing 17
Cited 93

Summary cache: a scalable wide-area Web cache sharing protocol

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Performance study of a collaborative method for hierarchical caching in proxy servers

WWW7 Proceedings of the seventh international conference on World Wide Web 7
GASS: a data movement and access service for wide area computing systems

Proceedings of the sixth workshop on I/O in parallel and distributed systems
On the scale and performance of cooperative Web proxy caching

Proceedings of the seventeenth ACM symposium on Operating systems principles
Application-level scheduling on distributed heterogeneous networks

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
The AppLeS parameter sweep template: user-level middleware for the grid

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Gathering at the well: creating communities for grid I/O

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Evaluation of Job-Scheduling Strategies for Grid Computing

GRID '00 Proceedings of the First IEEE/ACM International Workshop on Grid Computing
Identifying Dynamic Replication Strategies for a High-Performance Data Grid

GRID '01 Proceedings of the Second International Workshop on Grid Computing
Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
A Unified Resource Scheduling Framework for Heterogeneous Computing Environments

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Forecasting network performance to support dynamic scheduling using the network weather service

HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Demand-based document dissemination to reduce traffic and balance load in distributed information systems

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Grid Information Services for Distributed Resource Sharing

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
The Kangaroo Approach to Data Movement on the Grid

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications

Distributed computing research issues in grid computing

ACM SIGACT News
Simulation of Dynamic Grid Replication Strategies in OptorSim

GRID '02 Proceedings of the Third International Workshop on Grid Computing
Evaluating Scheduling and Replica Optimisation Strategies in OptorSim

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
References

Grid resource management
A grid service broker for scheduling distributed data-oriented applications on global grids

MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Dynamic scheduling II: fast simulation model for grid scheduling using HyperSim

Proceedings of the 35th conference on Winter simulation: driving innovation
Usage Policy-Based CPU Sharing in Virtual Organizations

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Use of PVFS for Efficient Execution of Jobs with Pipeline-Shared I/O

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
DI-GRUBER: A Distributed Approach to Grid Resource Brokering

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Impact of Admission and Cache Replacement Policies on Response Times of Jobs on Data Grids

Cluster Computing
A taxonomy of Data Grids for distributed data sharing, management, and processing

ACM Computing Surveys (CSUR)
A path selection-based algorithm for real-time data staging in Grid applications

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Concurrent Scheduling: Efficient Heuristics for Online Large-Scale Data Transfers in Distributed Real-Time Environments

IEEE Transactions on Parallel and Distributed Systems
A multi-dimensional scheduling scheme in a Grid computing environment

Journal of Parallel and Distributed Computing
Job scheduling and data replication on data grids

Future Generation Computer Systems
Practical Scheduling of Bag-of-Tasks Applications on Grids with Dynamic Resilience

IEEE Transactions on Computers
Security-driven scheduling for data-intensive applications on grids

Cluster Computing
Scheduling data-intensive bags of tasks in P2P grids with bittorrent-enabled data distribution

Proceedings of the second workshop on Use of P2P, GRID and agents for the development of content networks
Design and analysis of a load balancing strategy in data grids

Future Generation Computer Systems - Special section: Data mining in grid computing environments
An SCP-based heuristic approach for scheduling distributed data-intensive applications on global grids

Journal of Parallel and Distributed Computing
An economic model for grid scheduling

AIC'07 Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Informatics and Communications - Volume 7
Intelligent data staging with overlapped execution of grid applications

Future Generation Computer Systems
An adaptive meta-scheduler for data-intensive applications

International Journal of Grid and Utility Computing
Optimizing center performance through coordinated data staging, scheduling and recovery

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Data management policies and scheduling in grid computing

AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
A data placement service for petascale applications

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Supporting schedules of resource co-allocation for distributed computing in scalable systems

Programming and Computing Software
File transfer in Grid applications at deployment, execution and retrieval

Multiagent and Grid Systems - Grid Computing, high performance and distributed applications
Efficient reuse of replicated parallel data segments in computational grids

Future Generation Computer Systems
Optimizing workflow data footprint

Scientific Programming - Dynamic Computational Workflows: Discovery, Optimization and Scheduling
File grouping for scientific data management: lessons from experimenting with real traces

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Reproducible testing of distributed software with middleware virtualization and simulation

PADTAD '08 Proceedings of the 6th workshop on Parallel and distributed systems: testing, analysis, and debugging
DGSim: Comparing Grid Resource Management Architectures through Trace-Based Simulation

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Resource allocation in grid computing: an economic model

WSEAS Transactions on Computer Research
Performance evaluation of different replica placement algorithms

International Journal of Grid and Utility Computing
A new paradigm: Data-aware scheduling in grid computing

Future Generation Computer Systems
Model-based simulation and performance evaluation of grid scheduling strategies

Future Generation Computer Systems
Adaptive hierarchical scheduling policy for enterprise grid computing systems

Journal of Network and Computer Applications
Data placement for scientific applications in distributed environments

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Multi-Replication with Intelligent Staging in Data-Intensive Grid Applications

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
A Set Coverage-based Mapping Heuristic for Scheduling Distributed Data-Intensive Applications on Global Grids

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
New worker-centric scheduling strategies for data-intensive grid applications

Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
Efficient on-demand operations in dynamic distributed infrastructures

LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Methods of linear transfer speed estimation in the data grid

Proceedings of the 1st ACM workshop on Data grids for eScience
Simbatch: An API for Simulating and Predicting the Performance of Parallel Resources Managed by Batch Systems

Euro-Par 2008 Workshops - Parallel Processing
Providing Security of Real Time Data Intensive Applications on Grids Using Dynamic Scheduling

GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
Data-driven batch scheduling

Proceedings of the second international workshop on Data-aware distributed computing
Two-layer optimisation policy for improvement of application performance and resource utilisation in grid environments

International Journal of Systems Science
P2P file sharing for P2P computing

Multiagent and Grid Systems - Content management and delivery through P2P-based content networks
File Clustering Based Replication Algorithm in a Grid Environment

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
A Model-Based Algorithm for Optimizing I/O Intensive Applications in Clouds Using VM-Based Migration

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Grid scheduling divisible loads from two sources

Computers & Mathematics with Applications
Access-pattern and bandwidth aware file replication algorithm in a grid environment

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Application-Level and Job-Flow Scheduling: An Approach for Achieving Quality of Service in Distributed Computing

PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
Workload characterization in a high-energy data grid and impact on resource management

Cluster Computing
Research on the Trust-Adaptive Scheduling for Data-Intensive Applications on Data Grids

WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
A data locality aware online scheduling approach for I/O-intensive jobs with file sharing

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
New worker-centric scheduling strategies for data-intensive grid applications

MIDDLEWARE2007 Proceedings of the 8th ACM/IFIP/USENIX international conference on Middleware
Safety scheduling strategies in distributed computing

International Journal of Critical Computer-Based Systems
File-access patterns of data-intensive workflow applications and their implications to distributed filesystems

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Efficient data consolidation in grid networks and performance analysis

Future Generation Computer Systems
A framework for adaptive optimization of remote synchronous CSCW in the cloud computing era

SSS'10 Proceedings of the 12th international conference on Stabilization, safety, and security of distributed systems
Design of file size and type of access based replication algorithm for data grid

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Improving job scheduling performance with parallel access to replicas in Data Grid environment

The Journal of Supercomputing
An Economic Model for Resource Allocation in Grid Computing

Operations Research
A new approach using ants Algorithm to optimize load balancing and prioritization process in-dependent and independent models for the scheduling problem in distributed grid networks

AICT'11 Proceedings of the 2nd international conference on Applied informatics and computing theory
DECO: data replication and execution CO-scheduling for utility grids

ICSOC'06 Proceedings of the 4th international conference on Service-Oriented Computing
A PTS-PGATS based approach for data-intensive scheduling in data grids

Frontiers of Computer Science in China
Integrating local job scheduler – LSFTM with GfarmTM

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Heuristic-based scheduling to maximize throughput of data-intensive grid applications

IWDC'04 Proceedings of the 6th international conference on Distributed Computing
A deadline and budget constrained scheduling algorithm for escience applications on data grids

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Incorporating data movement into grid task scheduling

GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Replica placement in data grid: a multi-objective approach

GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Simultaneous scheduling of replication and computation for bioinformatic applications on the grid

ISBMDA'05 Proceedings of the 6th International conference on Biological and Medical Data Analysis
Grid file transfer during deployment, execution, and retrieval

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
The design and implementation of the KOALA co-allocating grid scheduler

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Swarm scheduling approaches for work-flow applications with security constraints in distributed data-intensive computing environments

Information Sciences: an International Journal
Effective dynamic replica maintenance algorithm for the grid environment

GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
An architecture and a dynamic scheduling algorithm of grid for providing security for real-time data-intensive applications

International Journal of Network Management
PFRF: An adaptive data replication algorithm based on star-topology data grids

Future Generation Computer Systems
Investigation of data locality and fairness in MapReduce

Proceedings of third international workshop on MapReduce and its Applications Date
An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Investigation of Data Locality in MapReduce

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Grid Matrix: a grid simulation tool to focus on the propagation of resource and monitoring information

Simulation
ATLAS grid workload on NDGF resources: analysis, modeling, and workload generation

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A joint data and computation scheduling algorithm for the grid

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Improving job scheduling performance with dynamic replication strategy in data grids

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
A reliability optimization method for RAID-structured storage systems based on active data migration

Journal of Systems and Software
Online cost-efficient scheduling of deadline-constrained workloads on hybrid clouds

Future Generation Computer Systems
Predictive File Replication on the Data Grids

International Journal of Grid and High Performance Computing
Job scheduling and dynamic data replication in data grid environment

The Journal of Supercomputing
A Bee Colony based optimization approach for simultaneous job scheduling and data replication in grid environments

Computers and Operations Research
Hopfield neural network for simultaneous job scheduling and data replication in grids

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due toa need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources.We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of job scheduling and data movement(replication) algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication on the scheduling strategy, it is not always necessary to couple data movement and computationscheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation of the overall Data Grid system.