Data Sieving and Collective I/O in ROMIO

Authors:
Rajeev Thakur;William Gropp;Ewing Lusk
Affiliations:
-;-;-
Venue:
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Year:
1999

Citing 0
Cited 99

On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
A novel application development environment for large-scale scientific computations

Proceedings of the 14th international conference on Supercomputing
Performance analysis of MPI-I/O primitives on a PC cluster

Proceedings of the 2002 ACM symposium on Applied computing
Active buffering plus compressed migration: an integrated solution to parallel simulations' data transport needs

ICS '02 Proceedings of the 16th international conference on Supercomputing
MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Data management for large-scale scientific computations in high performance distributed systems

Cluster Computing
Enhancing Data Migration Performance via Parallel Data Compression

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Compiler-Directed I/O Optimization

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Irregular and Out-of-Core Parallel Computing on Clusters

PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Towards a High-Performance Implementation of MPI-IO on Top of GPFS

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Towards Portable Runtime Support for Irregular and Out-of-Core Computations

Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Profile-guided I/O partitioning

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
References

Sourcebook of parallel computing
A distributed multi-storage I/O system for data intensive scientific computing

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Integrating collective I/O and cooperative caching into the "clusterfile" parallel file system

Proceedings of the 18th annual international conference on Supercomputing
A high-performance distributed parallel file system for data-intensive computations

Journal of Parallel and Distributed Computing
Parallel netCDF: A High-Performance Scientific I/O Interface

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Fast Parallel Non-Contiguous File Access

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A study of I/O methods for parallel visualization of large-scale data

Parallel Computing - Parallel graphics and visualization
Energy-aware data prefetching for multi-speed disks

Proceedings of the 3rd conference on Computing frontiers
Source level transformations to improve I/O data partitioning

SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Scalable Design and Implementations for MPI Parallel Overlapping I/O

IEEE Transactions on Parallel and Distributed Systems
Large files, small writes, and pNFS

Proceedings of the 20th annual international conference on Supercomputing
Coupling prefix caching and collective downloads for remote dataset access

Proceedings of the 20th annual international conference on Supercomputing
Startup comparison for message passing libraries with DTM on linux clusters

The Journal of Supercomputing
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Evaluating structured I/O methods for parallel file systems

International Journal of High Performance Computing and Networking
Improving I/O performance of applications through compiler-directed code restructuring

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Study of content-based image retrieval using parallel computing technique

CHINA HPC '07 Proceedings of the 2007 Asian technology information program's (ATIP's) 3rd workshop on High performance computing in China: solution approaches to impediments for high performance computing
Semantic-based distributed i/o with the paramedic framework

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
DART: a substrate for high speed asynchronous data IO

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Massively parallel genomic sequence search on the Blue Gene/P architecture

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Hiding I/O latency with pre-execution prefetching for parallel applications

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallel I/O prefetching using MPI file caching and I/O signatures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Prefetch throttling and data pinning for improving performance of shared caches

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO

ACM SIGOPS Operating Systems Review
Towards a High Performance Implementation of MPI-IO on the Lustre File System

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
A collective I/O implementation based on inspector---executor paradigm

The Journal of Supercomputing
Data Locality Aware Strategy for Two-Phase Collective I/O

High Performance Computing for Computational Science - VECPAR 2008
A Prefetching Algorithm for Multi-speed Disks

Transactions on High-Performance Embedded Architectures and Compilers I
An implementation of parallel file distribution in an agent hierarchy

The Journal of Supercomputing
Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment

Proceedings of the 18th ACM international symposium on High performance distributed computing
Performance Evaluation of Collective Write Algorithms in MPI I/O

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Evaluating Algorithms for Shared File Pointer Operations in MPI I/O

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Latency Hiding File I/O for Blue Gene Systems

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
MPISec I/O: Providing Data Confidentiality in MPI-I/O

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Processing MPI Datatypes Outside MPI

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Terascale data organization for discovering multivariate climatic trends

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
PLFS: a checkpoint filesystem for parallel applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Data layout optimization for petascale file systems

Proceedings of the 4th Annual Workshop on Petascale Data Storage
pNFS, POSIX, and MPI-IO: a tale of three semantics

Proceedings of the 4th Annual Workshop on Petascale Data Storage
Implementation and Evaluation of File Write-Back and Prefetching for MPI-IO Over GPFS

International Journal of High Performance Computing Applications
On evaluating decentralized parallel I/O scheduling strategies for parallel file systems

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
A Scalable Message Passing Interface Implementation of an Ad-Hoc Parallel I/o system

International Journal of High Performance Computing Applications
A study of real world I/O performance in parallel scientific computing

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
InterferenceRemoval: removing interference of disk access for MPI programs through data replication

Proceedings of the 24th ACM International Conference on Supercomputing
MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Cashing in on hints for better prefetching and caching in PVFS and MPI-IO

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A layout-aware optimization strategy for collective I/O

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Evaluating I/O characteristics and methods for storing structured scientific data

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A scheduling framework that makes any disk schedulers non-work-conserving solely based on request characteristics

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Exploiting Latent I/O Asynchrony in Petascale Science Applications

International Journal of High Performance Computing Applications
The impact of applications' I/O strategies on the performance of the Lustre parallel file system

International Journal of High Performance Systems Architecture
A cost-intelligent application-specific data layout scheme for parallel file systems

Proceedings of the 20th international symposium on High performance distributed computing
Six degrees of scientific data: reading patterns for extreme scale science IO

Proceedings of the 20th international symposium on High performance distributed computing
Software-directed data access scheduling for reducing disk energy consumption

Proceedings of the 20th international symposium on High performance distributed computing
Improving the average response time in collective I/O

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Server-side I/O coordination for parallel file systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A parallel input-output system for resolving spatial data challenges: an agent-based model case study

Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems
Efficient data IO for a Parallel Global Cloud Resolving Model

Environmental Modelling & Software
A profiling approach for the management of writing in irregular applications

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Self-adaptive hints for collective i/o

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Exploiting shared memory to improve parallel i/o performance

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Open MPI: a flexible high performance MPI

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Cooperative write-behind data buffering for MPI i/o

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Symmetrical data sieving for noncontiguous i/o accesses in molecular dynamics simulations

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
AME: an anyscale many-task computing engine

Proceedings of the 6th workshop on Workflows in support of large-scale science
Towards scalable I/O architecture for exascale systems

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Data driven infrastructure and policy selection to enhance scientific applications in grid

SAG'04 Proceedings of the First international conference on Scientific Applications of Grid Computing
Pattern-aware file reorganization in MPI-IO

Proceedings of the sixth workshop on Parallel Data Storage
Effective parallelization of loops in the presence of I/O operations

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
SERA-IO: Integrating Energy Consciousness into Parallel I/O Middleware

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Design and analysis of data management in scalable parallel scripting

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Visualization for the Physical Sciences

Computer Graphics Forum
Throttling I/O streams to accelerate file-IO performance

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
A New File-Specific Stripe Size Selection Method for Highly Concurrent Data Access

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Improving Bandwidth Efficiency for Consistent Multistream Storage

ACM Transactions on Storage (TOS)
Abstractions and Middleware for Petascale Computing and Beyond

International Journal of Distributed Systems and Technologies
Petascale I/O: challenges, solutions, and recommendations

Proceedings of the Extreme Scaling Workshop
Orthrus: a framework for implementing high-performance collective I/O in the multicore clusters

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
VIDAS: object-based virtualized data sharing for high performance storage I/O

Proceedings of the 4th ACM workshop on Scientific cloud computing
Memory-conscious collective I/O for extreme scale HPC systems

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Data deduplication in a hybrid architecture for improving write performance

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
ACIC: automatic cloud I/O configurator for HPC applications

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Taming parallel I/O complexity with auto-tuning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cost-intelligent application-specific data layout optimization for parallel file systems

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access.We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications---an astrophysics-application template (DIST3D), the NAS BTIO benchmark, and an unstructured code (UNSTRUC)---on five different parallel machines: HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.