Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The fuzzy barrier: a mechanism for high speed synchronization of processors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Employing register channels for the exploitation of instruction level parallelism
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Improving instruction cache behavior by reducing cache pollution
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Achieving Low Cost Synchronization in a Multiprocessor System
PARLE '89 Proceedings of the Parallel Architectures and Languages Europe, Volume I: Parallel Architectures
A shape matching approach for scheduling fine-grained parallelism
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A fine-grained MIMD architecture based upon register channels
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Hi-index | 0.00 |
This paper describes the architecture of a RISC based multiprocessor chip. The processors operate in a MIMD fashion executing parallel instruction streams generated by a parallelizing compiler for the exploitation of fine-grained parallelism. Low cost synchronization mechanisms are supported in hardware. The resulting system is tolerant of unpredictable delays in the progress of individual streams. Instruction level parallelism is exploited through the use of register channels and a mechanism for the collective branching of processors. For efficient synchronization during parallel execution of loops, fuzzy barriers are provided. On chip memory is organized into multiple banks in order to provide sufficient bandwidth for the processors. The RISC processors are based upon the Sun SPARC architecture.