Explicit control a batch-aware distributed file system

Authors:
John Bent;Douglas Thain;Andrea C. Arpaci-Dusseau;Remzi H. Arpaci-Dusseau;Miron Livny
Affiliations:
Computer Science Department, University of Wisconsin, Madison;Computer Science Department, University of Wisconsin, Madison;Computer Science Department, University of Wisconsin, Madison;Computer Science Department, University of Wisconsin, Madison;Computer Science Department, University of Wisconsin, Madison
Venue:
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Year:
2004

Citing 45
Cited 35

Exploiting read-mostly workloads in the FileNet file system

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Measurements of a distributed file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Disconnected operation in the Coda File System

ACM Transactions on Computer Systems (TOCS)
Merging application-centric and data-centric approaches to support transaction-oriented multi-system workflows

ACM SIGMOD Record
Interposition agents: transparently interposing user code at the system interface

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Specification and execution of transactional workflows

Modern database systems
An overview of workflow management: from process modeling to workflow automation infrastructure

Distributed and Parallel Databases - Special issue on software support for work flow management
Serverless network file systems

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Implementing global memory management in a workstation cluster

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Exokernel: an operating system architecture for application-level resource management

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Extensibility safety and performance in the SPIN operating system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Microkernels meet recursive virtual machines

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Dealing with disaster: surviving misbehaved kernel extensions

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Dummynet: a simple approach to the evaluation of network protocols

ACM SIGCOMM Computer Communication Review
A security architecture for computational grids

CCS '98 Proceedings of the 5th ACM conference on Computer and communications security
Automatic I/O hint generation through speculative execution

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
File system usage in Windows NT 4.0

Proceedings of the seventeenth ACM symposium on Operating systems principles
On the Optimum Checkpoint Interval

Journal of the ACM (JACM)
A trace-driven analysis of the UNIX 4.2 BSD file system

Proceedings of the tenth ACM symposium on Operating systems principles
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Cache performance for selected SPEC CPU2000 benchmarks

ACM SIGARCH Computer Architecture News
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
RP*: A Family of Order Preserving Scalable Distributed Data Structures

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A study of file sizes and functional lifetimes

SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
Flexibility, Manageability, and Performance in a Grid Storage Appliance

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Dynamic Virtual Clusters in a Grid Site Manager

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Pipeline and Batch Sharing in Grid Workloads

HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Condor-G: A Computation Management Agent for Multi-Institutional Grids

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Matchmaking frameworks for distributed resource management

Matchmaking frameworks for distributed resource management
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Transforming policies into mechanisms with infokernel

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Farsite: federated, available, and reliable storage for an incompletely trusted environment

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Taming aggressive replication in the Pangaea wide-area file system

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Ivy: a read/write peer-to-peer file system

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
The design and implementation of Zap: a system for migrating computing environments

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Optimizing the migration of virtual computers

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications
Data Staging on Untrusted Surrogates

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Scalable, distributed data structures for internet service construction

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
A secure environment for untrusted helper applications confining the Wily Hacker

SSYM'96 Proceedings of the 6th conference on USENIX Security Symposium, Focusing on Applications of Cryptography - Volume 6
A comparison of file system workloads

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Experience with a language for writing coherence protocols

DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Transparent result caching

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference

A fully automated fault-tolerant system for distributed video processing and off-site replication

NOSSDAV '04 Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video
Integrating databases and workflow systems

ACM SIGMOD Record
Identity Boxing: A New Technique for Consistent Global Identity

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
FreeLoader: Scavenging Desktop Storage Resources for Scientific Data

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A framework for reliable and efficient data placement in distributed computing systems

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
On the Benefits of aWorkflow-Aware File System in High-Performance Computing Systems

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Lessons and challenges in automating data dependability

Proceedings of the 11th workshop on ACM SIGOPS European workshop
Constructing collaborative desktop storage caches for large scientific datasets

ACM Transactions on Storage (TOS)
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Data driven workflow planning in cluster management systems

Proceedings of the 16th international symposium on High performance distributed computing
Storage optimization for large-scale distributed stream-processing systems

ACM Transactions on Storage (TOS)
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Intelligent data staging with overlapped execution of grid applications

Future Generation Computer Systems
Don't settle for less than the best: use optimization to make decisions

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Optimizing center performance through coordinated data staging, scheduling and recovery

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Accelerating large-scale data exploration through data diffusion

DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Performance Evaluation of Data Management Layer by Data Sharing Patterns for Grid RPC Applications

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A new paradigm: Data-aware scheduling in grid computing

Future Generation Computer Systems
Multi-Replication with Intelligent Staging in Data-Intensive Grid Applications

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
/scratch as a cache: rethinking HPC center scratch storage

Proceedings of the 23rd international conference on Supercomputing
File Clustering Based Replication Algorithm in a Grid Environment

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
ROAR: increasing the flexibility and performance of distributed search

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Access-pattern and bandwidth aware file replication algorithm in a grid environment

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Lessons learned from a year's worth of benchmarks of large data clouds

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Case studies in storage access by loosely coupled petascale applications

Proceedings of the 4th Annual Workshop on Petascale Data Storage
The case for a versatile storage system

ACM SIGOPS Operating Systems Review
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Accelerating parallel analysis of scientific simulation data via Zazen

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
A new i/o architecture for improving the performance in large scale clusters

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
XG: a data-driven computation grid for enterprise-scale mining

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
A Workflow-Aware Storage System: An Opportunity Study

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MapReduce with communication overlap (MaRCO)

Journal of Parallel and Distributed Computing
DDS: A deadlock detection-based scheduling algorithm for workflow computations in HPC systems with storage constraints

Parallel Computing
Active and accelerated learning of cost models for optimizing scientific applications

VLDB '06 Proceedings of the 32nd international conference on Very large data bases

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present the design, implementation, and evaluation of the Batch-Aware Distributed File System (BAD-FS), a system designed to orchestrate large, I/O-intensive batch workloads on remote computing clusters distributed across the wide area. BAD-FS consists of two novel components: a storage layer that exposes control of traditionally fixed policies such as caching, consistency, and replication; and a scheduler that exploits this control as necessary for different workloads. By extracting control from the storage layer and placing it within an external scheduler, BAD-FS manages both storage and computation in a coordinated way while gracefully dealing with cache consistency, fault-tolerance, and space management issues in a workload-specific manner. Using both microbenchmarks and real workloads, we demonstrate the performance benefits of explicit control, delivering excellent end-to-end performance across the wide-area.