SCMP: a single-chip message-passing parallel computer

  • Authors:
  • James M. Baker, Jr.;Brian Gold;Mark Bucciero;Sidney Bennett;Rajneesh Mahajan;Priyadarshini Ramachandran;Jignesh Shah

  • Affiliations:
  • Department of Mathematics and Computer Science, Virginia Military Institute, Lexington, VA;Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA

  • Venue:
  • The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

As technology improves and transistor feature sizes continue to shrink, the effects of on-chip interconnect wire latencies on processor clock speeds will become more important. In addition, as we reach the limits of instruction-level parallelism that can be extracted from application programs, there will be an increased emphasis on thread-level parallelism. To continue to improve performance, computer architects will need to focus on architectures that can efficiently support thread-level parallelism while minimizing the length of on-chip interconnect wires. The SCMP (Single-Chip Message-Passing) parallel computer system is one such architecture. The SCMP system includes up to 64 processors on a single chip, connected in a 2-D mesh with nearest neighbor connections. Memory is included on-chip with the processors and the architecture includes hardware support for communication and the execution of parallel threads. Since there are no global signals or shared resources between the processors, the length of the interconnect wires will be determined by the size of the individual processors, not the size of the entire chip. Avoiding long interconnect wires will allow the use of very high clock frequencies, which, when coupled with the use of multiple processors, will offer tremendous computational power.