SCMP: a single-chip message-passing parallel computer

Authors:
James M. Baker, Jr.;Brian Gold;Mark Bucciero;Sidney Bennett;Rajneesh Mahajan;Priyadarshini Ramachandran;Jignesh Shah
Affiliations:
Department of Mathematics and Computer Science, Virginia Military Institute, Lexington, VA;Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA
Venue:
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Year:
2004

Citing 20
Cited 2

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
High-Throughput, Low-Memory Applications on the Pica Architecture

IEEE Transactions on Parallel and Distributed Systems
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Will Physical Scalability Sabotage Performance Gains?

Computer
How Multimedia Workloads Will Change Processor Design

Computer
Baring It All to Software: Raw Machines

Computer
The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms

IEEE Micro
A Case for Intelligent RAM

IEEE Micro
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
The MAJC Architecture: A Synthesis of Parallelism and Scalability

IEEE Micro
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
VLSI Architecture: Past, Present, and Future

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Interconnect-Dominated VLSI Design

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Performance Evaluation of Two Emerging Media Processors: VIRAM and Imagine

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels

Proceedings of the 30th annual international symposium on Computer architecture
Blue Gene: a vision for protein science using a petaflop supercomputer

IBM Systems Journal - Deep computing for the life sciences

On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Exploring power reduction options for a single-chip multiprocessor through system-level modeling

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

As technology improves and transistor feature sizes continue to shrink, the effects of on-chip interconnect wire latencies on processor clock speeds will become more important. In addition, as we reach the limits of instruction-level parallelism that can be extracted from application programs, there will be an increased emphasis on thread-level parallelism. To continue to improve performance, computer architects will need to focus on architectures that can efficiently support thread-level parallelism while minimizing the length of on-chip interconnect wires. The SCMP (Single-Chip Message-Passing) parallel computer system is one such architecture. The SCMP system includes up to 64 processors on a single chip, connected in a 2-D mesh with nearest neighbor connections. Memory is included on-chip with the processors and the architecture includes hardware support for communication and the execution of parallel threads. Since there are no global signals or shared resources between the processors, the length of the interconnect wires will be determined by the size of the individual processors, not the size of the entire chip. Avoiding long interconnect wires will allow the use of very high clock frequencies, which, when coupled with the use of multiple processors, will offer tremendous computational power.