Effective parallelization of loops in the presence of I/O operations

Authors:
Min Feng;Rajiv Gupta;Iulian Neamtiu
Affiliations:
University of California, Riverside, Riverside, CA, USA;University of California, Riverside, Riverside, CA, USA;University of California, Riverside, Riverside, CA, USA
Venue:
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Year:
2012

Citing 23
Cited 1

Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
Compiler-based I/O prefetching for out-of-core applications

ACM Transactions on Computer Systems (TOCS)
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
An Abstract-Device Interface for Implementing Portable Parallel-I/O Interfaces

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Software behavior oriented parallelization

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Speculative Decoupled Software Pipelining

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Optimistic parallelism benefits from data partitioning

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Operating System Concepts

Operating System Concepts
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Intel threading building blocks

Intel threading building blocks
Copy or Discard execution model for speculative parallelization on multicores

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Delaunay Triangulation with Transactions and Barriers

IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Fast Track: A Software System for Speculative Program Optimization

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Supporting speculative parallelization in the presence of dynamic data structures

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
SpiceC: scalable parallelism via implicit copying and explicit commit

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Enhanced speculative parallelization via incremental recovery

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

General data structure expansion for multi-threading

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software-based thread-level parallelization has been widely studied for exploiting data parallelism in purely computational loops to improve program performance on multiprocessors. However, none of the previous efforts deal with efficient parallelization of hybrid loops, i.e., loops that contain a mix of computation and I/O operations. In this paper, we propose a set of techniques for efficiently parallelizing hybrid loops. Our techniques apply DOALL parallelism to hybrid loops by breaking the cross-iteration dependences caused by I/O operations. We also support speculative execution of I/O operations to enable speculative parallelization of hybrid loops. Helper threading is used to reduce the I/O bus contention caused by the improved parallelism. We provide an easy-to-use programming model for exploiting parallelism in loops with I/O operations. Parallelizing hybrid loops using our model requires few modifications to the code. We have developed a prototype implementation of our programming model. We have evaluated our implementation on a 24-core machine using eight applications, including a widely-used genomic sequence assembler and a multi-player game server, and others from PARSEC and SPEC CPU2000 benchmark suites. The hybrid loops in these applications take 23%-99% of the total execution time on our 24-core machine. The parallelized applications achieve speedups of 3.0x-12.8x with hybrid loop parallelization over the sequential versions of the same applications. Compared to the versions of applications where only computation loops are parallelized, hybrid loop parallelization improves the application performance by 68% on average.