A study of scalar compilation techniques for pipelined supercomputers
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
MIPS RISC architecture
Polycyclic Vector scheduling vs. Chaining on 1-Port Vector supercomputers
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
Behavioral characterization of decoupled access/execute architecture
ICS '91 Proceedings of the 5th international conference on Supercomputing
Memory latency effects in decoupled architectures with a single data memory module
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Register requirements of pipelined processors
ICS '92 Proceedings of the 6th international conference on Supercomputing
Hierarchical performance modeling with MACS: a case study of the convex C-240
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Communication in the KSR1 MPP: performance evaluation using synthetic workload experiments
ICS '94 Proceedings of the 8th international conference on Supercomputing
Architectural timing verification of CMOS RISC processors
IBM Journal of Research and Development - Special issue: IBM CMOS technology
IEEE Micro
Memory Latency Effects in Decoupled Architectures
IEEE Transactions on Computers
Program balance and its impact on high performance RISC architectures
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Paper: A comparative study of automatic vectorizing compilers
Parallel Computing
Hi-index | 0.00 |
A performance evaluation method for comparing and improving concurrent uniprocessor architectures is introduced, and a detailed case study of its application to two machines is reported. The architectures of the machines, the IBM RS/6000 and the Astronautics ZS-1, are described. The measured performance is given, and a model that yields hard bounds on a machine's performance on a benchmark is presented. The sources of performance loss, i.e. the difference between the model-derived bounds and the measured performance, are examined, and areas for improving the architectures are determined.