Communications of the ACM - Special section on computer architecture
Fine grain concurrent computations
Fine grain concurrent computations
A VLSI Architecture for Concurrent Data Structures
A VLSI Architecture for Concurrent Data Structures
The C Programmer''s Abbreviated Guide to Multicomputer Programming
The C Programmer''s Abbreviated Guide to Multicomputer Programming
VLSI Mesh Routing Systems
Crystal: from functional description to efficient parallel code
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
A message-passing model for highly concurrent computation
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Optimal matrix algorithms on homogeneous hypercubes
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A message passing coprocessor for distributed memory multicomputers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Express Cubes: Improving the Performance of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
Planar-adaptive routing: low-cost adaptive networks for multiprocessors
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The turn model for adaptive routing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
ComPaSS: efficient communication services for scalable architectures
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Fast deflection routing for packets and worms
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Improving AP1000 parallel computer performance with message communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Increasing network bandwidth on meshes
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Segment router: a novel router design for parallel computers
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
The turn model for adaptive routing
Journal of the ACM (JACM)
Unicast-Based Multicast Communication in Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
Storage-Efficient, Deadlock-Free Packet Routing Algorithms for Torus Networks
IEEE Transactions on Computers
IEEE Transactions on Computers
Planar-adaptive routing: low-cost adaptive networks for multiprocessors
Journal of the ACM (JACM)
Resource Placement with Multiple Adjacency Constraints in k-ary n-Cubes
IEEE Transactions on Parallel and Distributed Systems
Universal congestion control for meshes
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A Theory of Wormhole Routing in Parallel Computers
IEEE Transactions on Computers
On the benefit of supporting virtual channels in wormhole routers
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Parallel Divide and Conquer on Meshes
IEEE Transactions on Parallel and Distributed Systems
Resource Placement in Torus-Based Networks
IEEE Transactions on Computers
A Cost and Speed Model for k-ary n-Cube Wormhole Routers
IEEE Transactions on Parallel and Distributed Systems
The turn model for adaptive routing
25 years of the international symposia on Computer architecture (selected papers)
Cyclic-Cubes: A New Family of Interconnection Networks of Even Fixed-Degrees
IEEE Transactions on Parallel and Distributed Systems
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Transactions on Parallel and Distributed Systems
A Fault-Tolerant Routing Scheme for Meshes with Nonconvex Faults
IEEE Transactions on Parallel and Distributed Systems
Fast Gossiping in Square Meshes/Tori with Bounded-Size Packets
IEEE Transactions on Parallel and Distributed Systems
An Analysis of Edge Fault Tolerance in Recursively Decomposable Regular Networks
IEEE Transactions on Computers
Lee Distance and Topological Properties of k-ary n-cubes
IEEE Transactions on Computers
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A Network Flow Model for Load Balancing in Circuit-Switched Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Multicast Communication in Multicomputer Networks
IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers
IEEE Transactions on Parallel and Distributed Systems
A Rectilinear-Monotone Polygonal Fault Block Model for Fault-Tolerant Minimal Routing in Mesh
IEEE Transactions on Computers
Resource Placement in Torus-Based Networks
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Limited-Global Fault Information Model for Dynamic Routing in 2-D Meshes
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Spare processor allocation for fault tolerance in torus-based multicomputers
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Resource Placements in 2D Tori
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
On resource placements in 3D tori
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Perfect Distance-d Placements in 2D Toroidal Networks
The Journal of Supercomputing
Journal of Systems Architecture: the EUROMICRO Journal
Optimal gossiping in square 2D meshes
Theoretical Computer Science
Extended minimal routing in 2-D meshes with faulty blocks
International Journal of High Performance Computing and Networking
A dynamic programming algorithm for simulation of a multi-dimensional torus in a crossed cube
Information Sciences: an International Journal
X-torus: a variation of torus topology with lower diameter and larger bisection width
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Fluid mechanics: Implementation of a three-dimensional Navier-Stokes code on the Symult Series 2010
Mathematical and Computer Modelling: An International Journal
Expert Systems with Applications: An International Journal
Hi-index | 0.03 |
During the period following the completion of the Cosmic Cube experiment [1], and while commercial descendants of this first-generation multicomputer (message-passing concurrent computer) were spreading through a community that includes many of the attendees of this conference, members of our research group were developing a set of ideas about the physical design and programming for the second generation of medium-grain multicomputers.Our principal goal was to improve by as much as two orders of magnitude the relationship between message-passing and computing performance, and also to make the topology of the message-passing network practically invisible. Decreasing the communication latency relative to instruction execution times extends the application span of multicomputers from easily partitioned and distributed problems (eg, matrix computations, PDE solvers, finite element analysis, finite difference methods, distant or local field many-body problems, FFTs, ray tracing, distributed simulation of systems composed of loosely coupled physical processes) to computing problems characterized by “high flux” [2] or relatively fine-grain concurrent formulations [3, 4] (eg, searching, sorting, concurrent data structures, graph problems, signal processing, image processing, and distributed simulation of systems composed of many tightly coupled physical processes). Such applications place heavy demands on the message-passing network for high bandwidth, low latency, and non-local communication. Decreased message latency also improves the efficiency of the class of applications that have been developed on first-generation systems, and the insensitivity of message latency to process placement simplifies the concurrent formulation of application programs.Our other goals included a streamlined and easily layered set of message primitives, a node operating system based on a reactive programming model, open interfaces for accelerators and peripheral devices, and node performance improvements that could be achieved economically by using the same technology employed in contemporary workstation computers.By the autumn of 1986, these ideas had become sufficiently developed, molded together, and tested through simulation to be regarded as a complete architectural design. We were fortunate that the Ametek Computer Research Division was ready and willing to work with us to develop this system as a commercial product. The Ametek Series 2010 multicomputer is the result of this joint effort.