Compiler and runtime support for out-of-core HPF programs
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler directed memory management policy for numerical programs
Proceedings of the tenth ACM symposium on Operating systems principles
Organizing matrices and matrix operations for paged memory systems
Communications of the ACM
Irregular and Out-of-Core Parallel Computing on Clusters
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
HPF+ Investigations with Crash-Simulation Kernels
MPPM '97 Proceedings of the Conference on Massively Parallel Programming Models
ViC*: A Preprocessor for Virtual-Memory C*
ViC*: A Preprocessor for Virtual-Memory C*
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Optimizing irregular shared-memory applications for distributed-memory systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations
IEEE Transactions on Computers
Optimal multi-image processing streaming framework on parallel heterogeneous systems
EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
Hi-index | 0.00 |
In adaptive irregular out-of-core applications, communications and mass disk I/O operations occupy a large portion of the overall execution. This paper presents a program transformation scheme to enable overlap of communication, computation and disk I/O in this kind of applications. We take programs in inspector-executor model as starting point, and transform them to a pipeline fashion. By decomposing the inspector phase and reordering iterations, more overlap opportunities are efficiently utilized. In the experiments, our techniques are applied to two important applications i.e. Partial differential equation solver and Molecular dynamics problems. For these applications, versions employing our techniques are almost 30% faster than inspector-executor versions.