A Case for Direct-Mapped Caches
Computer
The connection machine
Algorithmic Techniques for Computer Vision on a Fine-Grained Parallel Machine
IEEE Transactions on Pattern Analysis and Machine Intelligence
BLITZEN: a highly integrated massively parallel machine
Journal of Parallel and Distributed Computing - Massively parallel computation
The DARPA image understanding benchmark for parallel computers
Journal of Parallel and Distributed Computing
Parallel Computations on Reconfigurable Meshes
IEEE Transactions on Computers
An empirical methodology for exploring reconfigurable architectures
Journal of Parallel and Distributed Computing
The evaluation of massively parallel array architectures
The evaluation of massively parallel array architectures
Issues in the Design of High Performance SIMD Architectures
IEEE Transactions on Parallel and Distributed Systems
An empirical study of datapath, memory hierarchy, and network in SIMD array architectures
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Abacus: a 1024 processor 8 ns SIMD array
ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
A 64 parallel integrated memory array processor and a 30 GIPS real-time vision system
CAMP '95 Proceedings of the Computer Architectures for Machine Perception
A programming and simulation model of a SIMD-MIMD architecture for image processing
CAMP '95 Proceedings of the Computer Architectures for Machine Perception
Hi-index | 0.00 |
The use of massively parallel SIMD array architectures is proliferating in the area of domain specific coprocessors. Even so, they have undergone few systematic empirical studies. The underlying problems include the size of the architecture space, the lack of portability of the test programs, and the inherent complexity of simulating up to hundreds of thousands of processing elements. We address the computational cost problem with a novel approach to trace-based simulation. Code is run on an abstract virtual machine to generate a coarse-grained trace, which is then refined through a series of transformations (a process we call trace compilation) wherein greater resolution is obtained with respect to the details of the target machine. We have found this technique to be one to two orders of magnitude faster than instruction-level simulation while still retaining much of the accuracy of the model. Furthermore, abstract machine traces must be regenerated for only a small fraction of the possible parameter combinations. Using virtual machine emulation and trace compilation also addresses program portability by allowing the user to code in a single data parallel language with a single compiler, regardless of the target architecture. This technique has already been used to generate significant results with respect to SIMD array architectures, a sample of which are presented here.