Stork: Making Data Placement a First Class Citizen in the Grid

Authors:
Tevfik Kosar;Miron Livny
Affiliations:
-;-
Venue:
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Year:
2004

Citing 0
Cited 54

A fully automated fault-tolerant system for distributed video processing and off-site replication

NOSSDAV '04 Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video
Data pipelines: enabling large scale multi-protocol data transfers

MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Phoenix: Making Data-Intensive Grid Applications Fault-Tolerant

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Integrating databases and workflow systems

ACM SIGMOD Record
Grid harvest service: a performance system of grid computing

Journal of Parallel and Distributed Computing
Advanced resource connector middleware for lightweight computational Grids

Future Generation Computer Systems - Special section: Information engineering and enterprise architecture in distributed computing environments
Managing data persistence in network enabled servers

Scientific Programming - Dynamic Grids and Worldwide Computing
Job scheduling and data replication on data grids

Future Generation Computer Systems
Data driven workflow planning in cluster management systems

Proceedings of the 16th international symposium on High performance distributed computing
A distributed job scheduling and flow management system

ACM SIGOPS Operating Systems Review
Intelligent data staging with overlapped execution of grid applications

Future Generation Computer Systems
A control theoretical approach to self-optimizing block transfer in Web service grids

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Optimizing center performance through coordinated data staging, scheduling and recovery

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Dynamic service selection in workflows using performance data

Scientific Programming - Dynamic Computational Workflows: Discovery, Optimization and Scheduling
INFORM: integrated flow orchestration and meta-scheduling for managed grid systems

Proceedings of the 2007 ACM/IFIP/USENIX international conference on Middleware companion
Dynamically tuning level of parallelism in wide area data transfers

DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Designing a resource broker for heterogeneous grids

Software—Practice & Experience
BitDew: a programmable environment for large-scale data management and distribution

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Using overlays for efficient data transfer over shared wide-area networks

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Performance Evaluation of Data Management Layer by Data Sharing Patterns for Grid RPC Applications

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A new paradigm: Data-aware scheduling in grid computing

Future Generation Computer Systems
Multi-Replication with Intelligent Staging in Data-Intensive Grid Applications

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
/scratch as a cache: rethinking HPC center scratch storage

Proceedings of the 23rd international conference on Supercomputing
Data-driven batch scheduling

Proceedings of the second international workshop on Data-aware distributed computing
Balancing TCP buffer vs parallel streams in application level throughput optimization

Proceedings of the second international workshop on Data-aware distributed computing
Design and Implementation of Metadata System in PetaShare

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction

Journal of Network and Computer Applications
Semantic enabled metadata management in PetaShare

International Journal of Grid and Utility Computing
Scheduling data-intensive workflows on storage constrained resources

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Lessons learned from a year's worth of benchmarks of large data clouds

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Node-capability-aware replica management for peer-to-peer grids

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
ROW-FS: a user-level virtualized redirect-on-write distributed file system for wide area applications

HiPC'07 Proceedings of the 14th international conference on High performance computing
Overlay network management for scheduling tasks on the grid

ICDCIT'07 Proceedings of the 4th international conference on Distributed computing and internet technology
A data placement strategy in scientific cloud workflows

Future Generation Computer Systems
File-Access Characteristics of Data-Intensive Workflow Applications

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A data transfer framework for large-scale science experiments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
File-access patterns of data-intensive workflow applications and their implications to distributed filesystems

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
GatorShare: a file system framework for high-throughput data management

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Error detection and error classification: failure awareness in data transfer scheduling

International Journal of Autonomic Computing
Improving workflow fault tolerance through provenance-based recovery

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
DECO: data replication and execution CO-scheduling for utility grids

ICSOC'06 Proceedings of the 4th international conference on Service-Oriented Computing
Simultaneous scheduling of replication and computation for bioinformatic applications on the grid

ISBMDA'05 Proceedings of the 6th International conference on Biological and Medical Data Analysis
Evolving toward the perfect schedule: co-scheduling job assignments and data replication in wide-area systems using a genetic algorithm

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Moving huge scientific datasets over the Internet

Concurrency and Computation: Practice & Experience
ATLAS grid workload on NDGF resources: analysis, modeling, and workload generation

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Data transfer in advance on cluster

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Data Placement in P2P Data Grids Considering the Availability, Security, Access Performance and Load Balancing

Journal of Grid Computing
Adapting scientific workflow structures using multi-objective optimization strategies

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Taming massive distributed datasets: data sampling using bitmap indices

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
StorkCloud: data transfer scheduling and optimization as a service

Proceedings of the 4th ACM workshop on Scientific cloud computing
Octopus: efficient data intensive computing on virtualized datacenters

Proceedings of the 6th International Systems and Storage Conference
A case for MapReduce over the internet

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
SDQuery DSI: integrating data management support with a wide area data transfer protocol

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Dynamic protocol tuning algorithms for high performance data transfers

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Todays scientific applications have huge data requirements which continue to increase drastically every year. These data are generally accessed by many users from all across the the globe. This implies a major necessity to move huge amounts of data around wide area networks to complete the computation cycle, which brings with it the problem of efficient and reliable data placement. The current approach to solve this problem of data placement is either doing it manually, or employing simple scripts which do not have any automation or fault tolerance capabilities. Our goal is to make data placement activities first class citizens in the Grid just like the computational jobs. They will be queued, scheduled, monitored, managed, and even check-pointed. More importantly, it will be made sure that they complete successfully and without any human interaction. We also believe that data placement jobs should be treated differently from computational jobs, since they may have different semantics and different characteristics. For this purpose, we have developed Stork, a scheduler for dataplacement activities in the Grid.