The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
RAID: high-performance, reliable secondary storage
ACM Computing Surveys (CSUR)
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Server-directed collective I/O in Panda
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
PPFS: a high performance portable parallel file system
ICS '95 Proceedings of the 9th international conference on Supercomputing
Automatic optimization of communication in compiling out-of-core stencil codes
ICS '96 Proceedings of the 10th international conference on Supercomputing
The galley parallel file system
ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic compiler-inserted I/O prefetching for out-of-core applications
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Design issues of a cooperative cache with no coherence problems
Proceedings of the fifth workshop on I/O in parallel and distributed systems
Automatic parallel I/O performance optimization in Panda
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
On implementing MPI-IO portably and with high performance
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Optimal prefetching and caching for parallel I/O sytems
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
TFLOPS PFS: architecture and design of a highly efficient parallel file system
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Optimizing noncontiguous accesses in MPI – IO
Parallel Computing
Using MPI-2: Advanced Features of the Message Passing Interface
Using MPI-2: Advanced Features of the Message Passing Interface
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Introduction to Algorithms
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
Storage-Aware Caching: Revisiting Caching for Heterogeneous Storage Systems
FAST '02 Proceedings of the Conference on File and Storage Technologies
Compiler-Directed I/O Optimization
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
My Cache or Yours? Making Storage More Exclusive
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Discretionary Caching for I/O on Clusters
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
A TDI System and Its Application to Approximation Algorithms
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Compiler support for out-of-core arrays on parallel machines
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Data Sieving and Collective I/O in ROMIO
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Disk-directed I/O for an out-of-core computation
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Power-Aware Storage Cache Management
IEEE Transactions on Computers
Software-Directed Disk Power Management for Scientific Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Exploiting redundancy to conserve energy in storage systems
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Application-controlled file caching policies
USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Karma: know-it-all replacement for a multilevel cache
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
InterferenceRemoval: removing interference of disk access for MPI programs through data replication
Proceedings of the 24th ACM International Conference on Supercomputing
Computation mapping for multi-level storage cache hierarchies
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Compiler-directed file layout optimization for hierarchical storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Ever-increasing complexity of large-scale applications and continuous increases in sizes of the data they process make the problem of maximizing performance of such applications a very challenging task. In particular, many challenging applications from the domains of astrophysics, medicine, biology, computational chemistry, and materials science are extremely data intensive. Such applications typically use a disk system to store and later retrieve their large data sets, and consequently, their disk performance is a critical concern. Unfortunately, while disk density has significantly improved over the last couple of decades, disk access latencies have not. As a result, I/O is increasingly becoming a bottleneck for data-intensive applications, and has to be addressed at the software level if we want to extract the maximum performance from modern computer architectures. This paper presents a compiler-directed code restructuring scheme for improving the I/O performance of data-intensive scientific applications. The proposed approach improves I/O performance by reducing the number of disk accesses through a new concept called disk reuse maximization. In this context, disk reuse refers to reusing the data in a given set of disks as much as possible before moving to other disks. Our compiler-based approach restructures application code, with the help of a polyhedral tool, such that disk reuse is maximized to the extent allowed by intrinsic data dependencies in the application code. The proposed optimization can be applied to each loop nest individually or to the entire application code. The experiments show that the average I/O improvements brought by the loop nest based version of our approach are 9.0% and 2.7%, over the original application codes and the codes optimized using conventional schemes, respectively. Further, the average improvements obtained when our approach is applied to the entire application code are 15.0% and 13.5%, over the original application codes and the codes optimized using conventional schemes, respectively. This paper also discusses how careful file layout selection helps to improve our performance gains, and how our proposed approach can be extended to work with parallel applications.