The connection machine
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
BLITZEN: a highly integrated massively parallel machine
Journal of Parallel and Distributed Computing - Massively parallel computation
Architectural tradeoffs in parallel computer design
Proceedings of the decennial Caltech conference on VLSI on Advanced research in VLSI
Unique design concepts on GF11 and their impact on performance
IBM Journal of Research and Development
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Motorola's 88000 Family Architecture
IEEE Micro
Architecture of the Pentium Microprocessor
IEEE Micro
An Analysis of Instruction-Cached SIMD Computer Architecture
An Analysis of Instruction-Cached SIMD Computer Architecture
Preprototyping SIMD coprocessors using virtual machine emulation and trace compilation
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Design of a Processing Element of a SIMD Computer for Genetic Algorithms
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A Mechanism for SIMD Execution of SPMD Programs
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Hiding Communication Latency in Data Parallel Applications
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Array control for high-performance SIMD systems
Journal of Parallel and Distributed Computing
Massively parallel processing on a chip
Proceedings of the 4th international conference on Computing frontiers
Hi-index | 0.00 |
In this paper, we consider the design of high performance SIMD architectures. We examine three mechanisms by which the performance of this class of machines may be improved, and which have been largely unexplored by the SIMD community. The mechanisms are pipelined instruction broadcast, pipelining of the PE architecture, and the introduction of a novel memory hierarchy in the PE address space which we denote the direct only data cache, (dod-cache). For each of the performance improvements, we develop analytical models of the potential speedup, and apply those models to real program traces obtained on a MasPar MP-2 system. In addition, we consider the impact of all improvements taken together.