Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Is data distribution necessary in OpenMP?
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Hi-index | 0.00 |
The distributed shared memory (DSM) architecture simplifies development of parallel programs by relieving a user from the tedious task of distributing data across processors. Furthermore, it allows incremental parallelization using, for example, OpenMP or Java threads. While it is easy to demonstrate good performance on a few processors, achieving good scalability still requires a good understanding of data flow in the application. In this paper we discuss ADAPT, an Automatic Data Alignment and Placement Tool, that detects data congestions in FORTRAN array oriented codes and suggests code transformations to resolve them. We then show how ADAPT suggested transformations, including data blocking, data placement, data transposition and page size control improve performance of the NAS Parallel Benchmarks.