Concurrency control and recovery in database systems
Concurrency control and recovery in database systems
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Deciding when to forget in the Elephant file system
Proceedings of the seventeenth ACM symposium on Operating systems principles
Accumulative versioning file system Moraine and its application to metrics environment MAME
SIGSOFT '00/FSE-8 Proceedings of the 8th ACM SIGSOFT international symposium on Foundations of software engineering: twenty-first century applications
Asynchronous Version Advancement in a Distributed Three-Version Database
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Efficient Management of Multiversion Documents by Object Referencing
Proceedings of the 27th International Conference on Very Large Data Bases
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Richer File System Metadata Using Links and Attributes
MSST '05 Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies
A Versatile and User-Oriented Versioning File System
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Using Dataflow Information to Improve Inter-Workflow Instance Concurrency
PDCAT '05 Proceedings of the Sixth International Conference on Parallel and Distributed Computing Applications and Technologies
Explicit control a batch-aware distributed file system
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Hi-index | 0.00 |
Traditional high-performance computing (HPC) systems have independent job schedulers and file systems that do not interact in substantial ways. We make the case that some integration of scheduler and file system can have three main benefits. First, the dataflow dependencies between the jobs in a workflow can be inferred by combining the scheduler's knowledge of the jobs (and possibly control-flow) and the file system's knowledge of the files accessed. Second, the dataflow information can be used to improve workflow instance concurrency when there are (potential) filename conflicts. Third, when workflows need to be re-computed, only the affected jobs need to be re-executed. We present the design and a simulation study of the Workflow-Aware File System (WaFS). Our design layers a Namespace Manager (NM) on top of existing file systems to provide, for example, a dataflow engine and a versioned file system. Our simulation study (with a specific set of application parameters) shows that a combined WaFSaware file system and scheduler can significantly improve makespans for intensive workloads and be efficient in the re-computation of jobs.