Improving I/O performance of applications through compiler-directed code restructuring

Authors:
Mahmut Kandemir;Seung Woo Son;Mustafa Karakoy
Affiliations:
Department of Computer Science and Engineering, The Pennsylvania State University;Department of Computer Science and Engineering, The Pennsylvania State University;Department of Computing, Imperial College
Venue:
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Year:
2008

Citing 34
Cited 6

The Omega test: a fast and practical integer programming algorithm for dependence analysis

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Server-directed collective I/O in Panda

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
PPFS: a high performance portable parallel file system

ICS '95 Proceedings of the 9th international conference on Supercomputing
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
The galley parallel file system

ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Design issues of a cooperative cache with no coherence problems

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Automatic parallel I/O performance optimization in Panda

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Optimal prefetching and caching for parallel I/O sytems

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
TFLOPS PFS: architecture and design of a highly efficient parallel file system

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Optimizing noncontiguous accesses in MPI – IO

Parallel Computing
Using MPI-2: Advanced Features of the Message Passing Interface

Using MPI-2: Advanced Features of the Message Passing Interface
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Introduction to Algorithms

Introduction to Algorithms
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Storage-Aware Caching: Revisiting Caching for Heterogeneous Storage Systems

FAST '02 Proceedings of the Conference on File and Storage Technologies
Compiler-Directed I/O Optimization

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
My Cache or Yours? Making Storage More Exclusive

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Discretionary Caching for I/O on Clusters

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
A TDI System and Its Application to Approximation Algorithms

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Compiler support for out-of-core arrays on parallel machines

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Disk-directed I/O for an out-of-core computation

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Power-Aware Storage Cache Management

IEEE Transactions on Computers
Software-Directed Disk Power Management for Scientific Applications

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Exploiting redundancy to conserve energy in storage systems

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Application-controlled file caching policies

USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Karma: know-it-all replacement for a multilevel cache

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies

InterferenceRemoval: removing interference of disk access for MPI programs through data replication

Proceedings of the 24th ACM International Conference on Supercomputing
Computation mapping for multi-level storage cache hierarchies

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A scheduling framework that makes any disk schedulers non-work-conserving solely based on request characteristics

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Compiler-directed file layout optimization for hierarchical storage systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ever-increasing complexity of large-scale applications and continuous increases in sizes of the data they process make the problem of maximizing performance of such applications a very challenging task. In particular, many challenging applications from the domains of astrophysics, medicine, biology, computational chemistry, and materials science are extremely data intensive. Such applications typically use a disk system to store and later retrieve their large data sets, and consequently, their disk performance is a critical concern. Unfortunately, while disk density has significantly improved over the last couple of decades, disk access latencies have not. As a result, I/O is increasingly becoming a bottleneck for data-intensive applications, and has to be addressed at the software level if we want to extract the maximum performance from modern computer architectures. This paper presents a compiler-directed code restructuring scheme for improving the I/O performance of data-intensive scientific applications. The proposed approach improves I/O performance by reducing the number of disk accesses through a new concept called disk reuse maximization. In this context, disk reuse refers to reusing the data in a given set of disks as much as possible before moving to other disks. Our compiler-based approach restructures application code, with the help of a polyhedral tool, such that disk reuse is maximized to the extent allowed by intrinsic data dependencies in the application code. The proposed optimization can be applied to each loop nest individually or to the entire application code. The experiments show that the average I/O improvements brought by the loop nest based version of our approach are 9.0% and 2.7%, over the original application codes and the codes optimized using conventional schemes, respectively. Further, the average improvements obtained when our approach is applied to the entire application code are 15.0% and 13.5%, over the original application codes and the codes optimized using conventional schemes, respectively. This paper also discusses how careful file layout selection helps to improve our performance gains, and how our proposed approach can be extended to work with parallel applications.