GPFS: a shared-disk file system for large computing clusters

Authors:
Frank Schmuck;Roger Haskin
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Year:
2002

Citing 7
Cited 36

Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Frangipani: a scalable distributed file system

Proceedings of the sixteenth ACM symposium on Operating systems principles
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
Recovery and Coherency-Control Protocols for Fast Intersystem Page Transfer and Fine-Granularity Locking in a Shared Disks Transaction Environment

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Notes on Data Base Operating Systems

Operating Systems, An Advanced Course
Tiger shark: a scalable file system for multimedia

IBM Journal of Research and Development - Papers on mustimedia systems
Scalability in the XFS file system

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Implementation and Evaluation of File Write-Back and Prefetching for MPI-IO Over GPFS

International Journal of High Performance Computing Applications
Just in time: adding value to the IO pipelines of high performance applications with JITStaging

Proceedings of the 20th international symposium on High performance distributed computing
Understanding and Improving Computational Science Storage Access through Continuous Characterization

ACM Transactions on Storage (TOS)
Optimizing multi-deployment on clouds by means of self-adaptive prefetching

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications

State of the Practice Reports
BlobCR: efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A high availability mechanism for parallel file system

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
One optimized I/O configuration per HPC application: leveraging the configurability of cloud

Proceedings of the Second Asia-Pacific Workshop on Systems
A preliminary out-of-core extension of a parallel multifrontal solver

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Easing the burdens of HPC file management

Proceedings of the sixth workshop on Parallel Data Storage
Exploring distributed hash tables in HighEnd computing

ACM SIGMETRICS Performance Evaluation Review
Bridging HPC and grid file i/o with IOFSL

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
A comparison of secure multi-tenancy architectures for filesystem storage clouds

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Extracting flexible, replayable models from large block traces

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
I/O threads to reduce checkpoint blocking for an electromagnetics solver on Blue Gene/P and Cray XK6

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Modeling a leadership-scale storage system

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
HFAA: a generic socket API for Hadoop file systems

Proceedings of the 2nd Workshop on Architectures and Systems for Big Data
End-to-End Data-Flow Parallelism for Throughput Optimization in High-Speed Networks

Journal of Grid Computing
A study on data deduplication in HPC storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
The xDotGrid native, cross-platform, high-performance xDFS file transfer framework

Computers and Electrical Engineering
Scalable Reed-Solomon-based reliable local storage for HPC applications on iaas clouds

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A comparison of secure multi-tenancy architectures for filesystem storage clouds

Proceedings of the 12th International Middleware Conference
A classification of file placement and replication methods on grids

Future Generation Computer Systems
Beyond block I/O: implementing a distributed shared log in hardware

Proceedings of the 6th International Systems and Storage Conference
Exploring reliability of exascale systems through simulations

Proceedings of the High Performance Computing Symposium
Exploring the future of out-of-core computing with compute-local non-volatile memory

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Journal of Grid Computing
A Prefetching Scheme Exploiting both Data Layout and Access History on Disk

ACM Transactions on Storage (TOS)
Leveraging collaborative content exchange for on-demand VM multi-deployments in iaas clouds

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
TABLEFS: enhancing metadata efficiency in the local file system

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Trevi: watering down storage hotspots with cool fountain codes

Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks
Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
IKAROS: An HTTP-Based Distributed File System, for Low Consumption & Low Specification Devices

Journal of Grid Computing
JETS: Language and System Support for Many-Parallel-Task Workflows

Journal of Grid Computing
Virtual machine workloads: the case for new benchmarks for NAS

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits in the context of a product that runs on the largest systems in existence. While in many cases existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.