The architecture and programming of the Ametek series 2010 multicomputer

Authors:
C. L. Seitz;W. C. Athas;C. M. Flaig;A. J. Martin;J. Seizovic;C. S. Steele;W-K. Su
Affiliations:
Department of Computer Science, California Institute of Technology;Department of Computer Science, California Institute of Technology;Department of Computer Science, California Institute of Technology;Department of Computer Science, California Institute of Technology;Department of Computer Science, California Institute of Technology;Department of Computer Science, California Institute of Technology;Department of Computer Science, California Institute of Technology
Venue:
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Year:
1988

Citing 5
Cited 54

The cosmic cube

Communications of the ACM - Special section on computer architecture
Fine grain concurrent computations

Fine grain concurrent computations
A VLSI Architecture for Concurrent Data Structures

A VLSI Architecture for Concurrent Data Structures
The C Programmer''s Abbreviated Guide to Multicomputer Programming

The C Programmer''s Abbreviated Guide to Multicomputer Programming
VLSI Mesh Routing Systems

VLSI Mesh Routing Systems

Crystal: from functional description to efficient parallel code

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
A message-passing model for highly concurrent computation

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Optimal matrix algorithms on homogeneous hypercubes

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A message passing coprocessor for distributed memory multicomputers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Express Cubes: Improving the Performance of k-ary n-cube Interconnection Networks

IEEE Transactions on Computers
Planar-adaptive routing: low-cost adaptive networks for multiprocessors

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The turn model for adaptive routing

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
ComPaSS: efficient communication services for scalable architectures

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Fast deflection routing for packets and worms

PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Improving AP1000 parallel computer performance with message communication

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Increasing network bandwidth on meshes

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Segment router: a novel router design for parallel computers

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
The turn model for adaptive routing

Journal of the ACM (JACM)
Unicast-Based Multicast Communication in Wormhole-Routed Networks

IEEE Transactions on Parallel and Distributed Systems
Storage-Efficient, Deadlock-Free Packet Routing Algorithms for Torus Networks

IEEE Transactions on Computers
The Chaos Router

IEEE Transactions on Computers
Planar-adaptive routing: low-cost adaptive networks for multiprocessors

Journal of the ACM (JACM)
Resource Placement with Multiple Adjacency Constraints in k-ary n-Cubes

IEEE Transactions on Parallel and Distributed Systems
Universal congestion control for meshes

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A Theory of Wormhole Routing in Parallel Computers

IEEE Transactions on Computers
On the benefit of supporting virtual channels in wormhole routers

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Parallel Divide and Conquer on Meshes

IEEE Transactions on Parallel and Distributed Systems
Resource Placement in Torus-Based Networks

IEEE Transactions on Computers
A Cost and Speed Model for k-ary n-Cube Wormhole Routers

IEEE Transactions on Parallel and Distributed Systems
The turn model for adaptive routing

25 years of the international symposia on Computer architecture (selected papers)
Cyclic-Cubes: A New Family of Interconnection Networks of Even Fixed-Degrees

IEEE Transactions on Parallel and Distributed Systems
Supporting systolic and memory communication in iWarp

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Fault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multicomputers Using Extended Safety Levels

IEEE Transactions on Parallel and Distributed Systems
A Fault-Tolerant Routing Scheme for Meshes with Nonconvex Faults

IEEE Transactions on Parallel and Distributed Systems
Fast Gossiping in Square Meshes/Tori with Bounded-Size Packets

IEEE Transactions on Parallel and Distributed Systems
An Analysis of Edge Fault Tolerance in Recursively Decomposable Regular Networks

IEEE Transactions on Computers
Lee Distance and Topological Properties of k-ary n-cubes

IEEE Transactions on Computers
Limits on Interconnection Network Performance

IEEE Transactions on Parallel and Distributed Systems
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
A Network Flow Model for Load Balancing in Circuit-Switched Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Multicast Communication in Multicomputer Networks

IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers

IEEE Transactions on Parallel and Distributed Systems
A Rectilinear-Monotone Polygonal Fault Block Model for Fault-Tolerant Minimal Routing in Mesh

IEEE Transactions on Computers
Resource Placement in Torus-Based Networks

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Limited-Global Fault Information Model for Dynamic Routing in 2-D Meshes

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Spare processor allocation for fault tolerance in torus-based multicomputers

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Resource Placements in 2D Tori

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
On resource placements in 3D tori

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Perfect Distance-d Placements in 2D Toroidal Networks

The Journal of Supercomputing
A heuristic fault-tolerant routing algorithm in mesh using rectilinear-monotone polygonal fault blocks

Journal of Systems Architecture: the EUROMICRO Journal
Optimal gossiping in square 2D meshes

Theoretical Computer Science
Extended minimal routing in 2-D meshes with faulty blocks

International Journal of High Performance Computing and Networking
A dynamic programming algorithm for simulation of a multi-dimensional torus in a crossed cube

Information Sciences: an International Journal
On a deadlock and performance analysis of ALBR and DAR algorithm on X-Torus topology by optimal utilization of Cross Links and minimal lookups

The Journal of Supercomputing
X-torus: a variation of torus topology with lower diameter and larger bisection width

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Fluid mechanics: Implementation of a three-dimensional Navier-Stokes code on the Symult Series 2010

Mathematical and Computer Modelling: An International Journal
Congestion-aware ant colony based routing algorithms for efficient application execution on Network-on-Chip platform

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.03

Visualization

Abstract

During the period following the completion of the Cosmic Cube experiment [1], and while commercial descendants of this first-generation multicomputer (message-passing concurrent computer) were spreading through a community that includes many of the attendees of this conference, members of our research group were developing a set of ideas about the physical design and programming for the second generation of medium-grain multicomputers.Our principal goal was to improve by as much as two orders of magnitude the relationship between message-passing and computing performance, and also to make the topology of the message-passing network practically invisible. Decreasing the communication latency relative to instruction execution times extends the application span of multicomputers from easily partitioned and distributed problems (eg, matrix computations, PDE solvers, finite element analysis, finite difference methods, distant or local field many-body problems, FFTs, ray tracing, distributed simulation of systems composed of loosely coupled physical processes) to computing problems characterized by “high flux” [2] or relatively fine-grain concurrent formulations [3, 4] (eg, searching, sorting, concurrent data structures, graph problems, signal processing, image processing, and distributed simulation of systems composed of many tightly coupled physical processes). Such applications place heavy demands on the message-passing network for high bandwidth, low latency, and non-local communication. Decreased message latency also improves the efficiency of the class of applications that have been developed on first-generation systems, and the insensitivity of message latency to process placement simplifies the concurrent formulation of application programs.Our other goals included a streamlined and easily layered set of message primitives, a node operating system based on a reactive programming model, open interfaces for accelerators and peripheral devices, and node performance improvements that could be achieved economically by using the same technology employed in contemporary workstation computers.By the autumn of 1986, these ideas had become sufficiently developed, molded together, and tested through simulation to be regarded as a complete architectural design. We were fortunate that the Ametek Computer Research Division was ready and willing to work with us to develop this system as a commercial product. The Ametek Series 2010 multicomputer is the result of this joint effort.