An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets

Authors:
Mahmut Kandemir;Alok Choudhary;J. Ramanujam
Affiliations:
CSE Department, The Pennsylvania State University, University Park, PA 16802 kandemir@cse.psu.edu;ECE Department, Northwestern University, Evanston, IL 60208 choudhar@ece.nwu.edu;ECE Department, Louisiana State University, Baton Rauge, LA 70803 jxr@ee.lsu.edu
Venue:
The Journal of Supercomputing
Year:
2002

Citing 36
Cited 5

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving locality and parallelism in nested loops

Improving locality and parallelism in nested loops
High-performance I/O for massively parallel computers: problems and prospects

Computer
Compiling for numa parallel machines

Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A model and compilation strategy for out-of-core data parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
An extended two-phase method for accessing sections of out-of-core arrays

Scientific Programming
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Input/output access pattern classification using hidden Markov models

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Automatic parallel I/O performance optimization in Panda

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A hyperplane based approach for optimizing spatial locality in loop nests

ICS '98 Proceedings of the 12th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Microprocessor file system interfaces

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Passion: Optimized I/O for Parallel Applications

Computer
Language, compiler and parallel database support for I/O intensive applications

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Data Access Reorganizations in Compiling Out-of-Core Data Parallel Programs on Distributed Memory Machines

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Reuse-Driven Tiling for Data Locality

LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
An Experimental Evaluation of the Parallel I/O Systems of the IBM SP and Intel Paragon Using a Production Application

Proceedings of the Third International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O: Parallel Computation
Compiler support for out-of-core arrays on parallel machines

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
I/O Requirements of Scientific Applications: An Evolutionary View

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
A Matrix-Based Approach to the Global Locality Optimization Problem

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
ViC*: A Preprocessor for Virtual-Memory C*

ViC*: A Preprocessor for Virtual-Memory C*
Automatic Computation and Data Decomposition for Multiprocessors

Automatic Computation and Data Decomposition for Multiprocessors
Techniques for compiling i/o intensive parallel programs

Techniques for compiling i/o intensive parallel programs

Performance modeling and optimization of parallel out-of-core tensor contractions

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Enhancing the performance of MPI-IO applications by overlapping I/O, computation and communication

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Comparative evaluation of overlap strategies with study of I/O overlap in MPI-IO

ACM SIGOPS Operating Systems Review
Out-of-Core Computations of High-Resolution Level Sets by Means of Code Transformation

Journal of Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a tiling technique that can be used by application programmers and optimizing compilers to obtain I/O-efficient versions of regular scientific loop nests. Due to the particular characteristics of I/O operations, a straightforward extension of the traditional tiling method to I/O-intensive programs may result in poor I/O performance. Therefore, the technique presented in this paper adapts iteration space tiling for I/O-performing loop nests to deliver high I/O performance. The generated code results in huge savings in the number of I/O calls as well as the volume of data transferred between the disk subsystem and main memory. Our experimental results on the IBM SP-2 distributed-memory message-passing multiprocessor demonstrate that the reduction in these two parameters, namely, the number of I/O calls and the transferred data volume, can lead to a marked decrease in overall execution times of I/O-intensive loop nests. In a number of loop nests extracted from several benchmarks and math libraries, we were able to improve the execution times by an average 42.5% for one data set and by an average 47.4% for another.