An extended two-phase method for accessing sections of out-of-core arrays

Authors:
Rajeev Thakur;Alok Choudhary
Affiliations:
-;-
Venue:
Scientific Programming
Year:
1996

Citing 0
Cited 36

The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Remote I/O: fast access to distant storage

Proceedings of the fifth workshop on I/O in parallel and distributed systems
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Querying very large multi-dimensional datasets in ADR

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
An evaluation of Java's I/O capabilities for high-performance computing

Proceedings of the ACM 2000 conference on Java Grande
Integrating parallel file I/O and database support for high-performance scientific data management

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Compiler-Directed Collective-I/O

IEEE Transactions on Parallel and Distributed Systems
A case for using MPI's derived datatypes to improve I/O performance

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
PDS/PIO: lightweight libraries for collective parallel I/O

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Conclusions

Beowulf cluster computing with Linux
An Experimental Evaluation of I/O Optimizations on Different Applications

IEEE Transactions on Parallel and Distributed Systems
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets

The Journal of Supercomputing
Parallel data intensive computing in scientific and commercial applications

Parallel Computing - Parallel data-intensive algorithms and applications
An Experimental Evaluation of I/O Optimizations on Different Applications

IEEE Transactions on Parallel and Distributed Systems
A Scientific Data Management System for Irregular Applications

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
High Level Programming Methodologies for Data Intensive Computations

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
High-performance scientific data management system

Journal of Parallel and Distributed Computing
References

Sourcebook of parallel computing
Parallel netCDF: A High-Performance Scientific I/O Interface

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Parallel I/O

International Journal of High Performance Computing Applications
Scalable Design and Implementations for MPI Parallel Overlapping I/O

IEEE Transactions on Parallel and Distributed Systems
Evaluating structured I/O methods for parallel file systems

International Journal of High Performance Computing and Networking
Using MPI file caching to improve parallel write performance for large-scale scientific applications

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Massively parallel genomic sequence search on the Blue Gene/P architecture

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Prefetch throttling and data pinning for improving performance of shared caches

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Self-consistent MPI-IO Performance Requirements and Expectations

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A layout-aware optimization strategy for collective I/O

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Design and implementation of parallel file aggregation mechanism

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
A cost-intelligent application-specific data layout scheme for parallel file systems

Proceedings of the 20th international symposium on High performance distributed computing
Towards scalable I/O architecture for exascale systems

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Data driven infrastructure and policy selection to enhance scientific applications in grid

SAG'04 Proceedings of the First international conference on Scientific Applications of Grid Computing
Scalable in situ scientific data encoding for analytical query processing

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Insights for exascale IO APIs from building a petascale IO API

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cost-intelligent application-specific data layout optimization for parallel file systems

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A number of applications on parallel computers deal with very largedata sets that cannot fit in main memory. In such applications,data must be stored in files on disks and fetched into memoryduring program execution. Parallel programs with large out-of-corearrays stored in files must read/write smaller sections of thearrays from/to files. In this article, we describe a method foraccessing sections of out-of-core arrays efficiently. Our method,the extended two-phase method, uses collective l/O: Processorscooperate to combine several l/O requests into fewer largergranularity requests, to reorder requests so that the file isaccessed in proper sequence, and to eliminate simultaneous l/Orequests for the same data. In addition, the l/O workload isdivided among processors dynamically, depending on the accessrequests. We present performance results obtained from two realout-of-core parallel applications - matrix multiplication and aLaplace's equation solver - and several synthetic access patterns,all on the Intel Touchstone Delta. These results indicate that theextended two-phase method significantly outperformed a direct(noncollective) method for accessing out-of-core array sections.