An evaluation of Cray X-MP performance on vectorizable Livermore FORTRAN kernels
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Polycyclic Vector scheduling vs. Chaining on 1-Port Vector supercomputers
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Behavioral characterization of multiprocessor memory systems: a case study
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1
Computer - Special issue on experimental research in computer architecture
Behavioral characterization of decoupled access/execute architecture
ICS '91 Proceedings of the 5th international conference on Supercomputing
Performance prediction of loop constructs on multiprocessor hierarchical-memory systems
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Optimal local register allocation for a multiple-issue machine
ICS '94 Proceedings of the 8th international conference on Supercomputing
Communication in the KSR1 MPP: performance evaluation using synthetic workload experiments
ICS '94 Proceedings of the 8th international conference on Supercomputing
Binary translation and architecture convergence issues for IBM system/390
Proceedings of the 14th international conference on Supercomputing
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
The MACS performance model introduced here can be applied to a Machine and Application of interest, the Compiler-generated workload, and the Scheduling of the workload by the compiler. The Ma, MAC, and MACS bounds each fix the named subset of M, A, C, and S while freeing the bound from the constraints imposed by the others. A/X performance measurement is used to measure access-only and execute-only code performance. Such hierarchical performance modeling exposes the gaps between the various bounds, the A/X measurements, and the actual performance, thereby focusing performance optimization at the appropriate levels in a systematic and goal-directed manner. A simple, but detailed, case study of the Convex C-240 vector mini-supercomputer illustrates the method.