Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Strategies for cache and local memory management by global program transformation
Proceedings of the 1st International Conference on Supercomputing
Data Organization in Parallel Computers
Data Organization in Parallel Computers
Building analytical models into an interactive performance prediction tool
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Experimentally Characterizing the Behavior of Multiprocessor Memory Systems: A Case Study
IEEE Transactions on Software Engineering
Behavioral characterization of decoupled access/execute architecture
ICS '91 Proceedings of the 5th international conference on Supercomputing
ICS '92 Proceedings of the 6th international conference on Supercomputing
Hierarchical performance modeling with MACS: a case study of the convex C-240
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A static parameter based performance prediction tool for parallel programs
ICS '93 Proceedings of the 7th international conference on Supercomputing
Performance prediction of parallel processing systems: the PAMELA methodology
ICS '93 Proceedings of the 7th international conference on Supercomputing
Precise compile-time performance prediction for superscalar-based computers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compiling performance models from parallel programs
ICS '94 Proceedings of the 8th international conference on Supercomputing
Hi-index | 0.00 |
In this paper we discuss the performance prediction of Fortran constructs commonly found in numerical scientific computing. Although the approach is applicable to multi-processors in general, within the scope of the paper we will concentrate on the Alliant FX/8 multiprocessor. The techniques proposed involve a combination of empirical observations, architectural models and analytical techniques, and exploits earlier work on data locality analysis and empirical characterization of the behavior of memory systems. The Lawrence Livermore Loops are used as a test-case to verify the approach.