Vector Computer Memory Bank Contention
IEEE Transactions on Computers
A close look at vector performance of register-to-register vector computers and a new model
SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Performance evaluation of static and dynamic memory systems on the Cray-2
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Performance prediction tools for Cedar: a multiprocessor supercomputer
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Building analytical models into an interactive performance prediction tool
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Behavioral characterization of decoupled access/execute architecture
ICS '91 Proceedings of the 5th international conference on Supercomputing
Hierarchical performance modeling with MACS: a case study of the convex C-240
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
MAD Kernels: An Experimental Testbed to Study Multiprocessor Memory System Behavior
IEEE Transactions on Parallel and Distributed Systems
SPARK: a benchmark package for sparse computations
ICS '90 Proceedings of the 4th international conference on Supercomputing
Performance evaluation and prediction for parallel algorithms on the BBN GP1000
ICS '90 Proceedings of the 4th international conference on Supercomputing
Performance Measurement Intrusion and Perturbation Analysis
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.01 |
The speed and efficiency of the memory system is a key limiting factor in the performance of supercomputers. Consequently, one of the major concerns when developing a high-performance code, either manually or automatically, is determining and characterizing the influence of the memory system on performance in terms of algorithmic parameters. Unfortunately, the performance data available to an algorithm designer such as various benchmarks and, occasionally, manufacturer-supplied information, e.g. instruction timings and architecture component characteristics, are rarely sufficient for this task. In this paper, we discuss a systematic methodology for probing the performance characteristics of a memory system via a hierarchy of data-movement kernels. We present and analyze the results obtained by such a methodology on a cache-based multi-vector processor (Alliant FX/8). Finally, we indicate how these experimental results can be used for predicting the performance of simple Fortran codes by a combination of empirical observations, architectural models and analytical techniques.