on Parallel MIMD computation: HEP supercomputer and its applications
A processor architecture for horizon
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A variable instruction stream extension to the VLIW architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Achieving low cost synchronization in a multiprocessor system
Future Generation Computer Systems - Parallel computing
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The named-state register file
The Offset Cube: A Three-Dimensional Multicomputer Network Topology Using Through-Wafer Optics
IEEE Transactions on Parallel and Distributed Systems
A Mechanism for Efficient Context Switching
ICCD '91 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
The Offset Cube: A Three-Dimensional Multicomputer Network Topology Using Through-Wafer Optics
IEEE Transactions on Parallel and Distributed Systems
HiPER: A Compact Narrow Channel Router with Hop-by-Hop Error Correction
IEEE Transactions on Parallel and Distributed Systems
SCMP: a single-chip message-passing parallel computer
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Hi-index | 0.00 |
This paper describes Pica, a fine-grain, message-passing architecture designed to efficiently support high-throughput, low-memory parallel applications, such as image processing, object recognition, and data compression. By specializing the processor and reducing local memory (4,096 36-bit words), multiple nodes can be implemented on a single chip. This allows high-performance systems for high-throughput applications to be realized at lower cost. The architecture minimizes overhead for basic parallel operations. An operand-addressed context cache and round-robin task manager support fast task swapping. Fixed-sized activation contexts simplify storage management. Word-tag synchronization bits provide low-cost synchronization. Several applications have been developed for this architecture, including thermal relaxation, matrix multiplication, JPEG image compression, and Positron Emission Tomography image reconstruction. These applications have been executed using an instrumented instruction-level simulator. The results of these experiments and an evaluation of Pica's architectural features are presented.