The network architecture of the Connection Machine CM-5 (extended abstract)

Authors:
Charles E. Leiserson;Zahi S. Abuhamdeh;David C. Douglas;Carl R. Feynman;Mahesh N. Ganmukhi;Jeffrey V. Hill;Daniel Hillis;Bradley C. Kuszmaul;Margaret A. St. Pierre;David S. Wells;Monica C. Wong;Shaw-Wen Yang;Robert Zak
Affiliations:
-;-;-;-;-;-;-;-;-;-;-;-;-
Venue:
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Year:
1992

Citing 10
Cited 124

The cosmic cube

Communications of the ACM - Special section on computer architecture
Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
The fuzzy barrier: a mechanism for high speed synchronization of processors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Vector models for data-parallel computing

Vector models for data-parallel computing
Scalable Shared-Memory Multiprocessor Architectures

Computer
An IEEE 1149.1 Compliant Testability Architecture with Internal Scan

ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
Functional VLSI Design Verification Methodology for the CM-5 Massively Parallel Supercomputer

ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors

A tightly-coupled processor-network interface

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Disseminating critical target-specific synchronization information in parallel discrete event simulations

PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
The CM-5 Connection Machine: a scalable supercomputer

Communications of the ACM
Does your workstation computation belong on a vector supercomputer?

Communications of the ACM
An atomic model for message-passing

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Bounds on the efficiency of message-passing protocols for parallel computers

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Supporting sets of arbitrary connections on iWarp through communication context switches

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Randomized routing with shorter paths

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Communication and computation performance of the CM-5

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Performance of the CM-5 scalable file system

ICS '94 Proceedings of the 8th international conference on Supercomputing
An architecture for optimal all-to-all personalized communication

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Cost/performance of a parallel computer simulator

PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Request Combining in Multiprocessors with Arbitrary Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems
The Fat-Pyramid and Universal Parallel Computation Independent of Wire Delay

IEEE Transactions on Computers
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software overhead in messaging layers: where does the time go?

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimal evaluation of array expressions on massively parallel machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
The SP2 high-performance switch

IBM Systems Journal
High-level optimization via automated statistical modeling

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic active messages: a mechanism for scheduling communication with computation

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel algorithms for the circuit value update problem

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Efficient techniques for fast nested barrier synchronization

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Universal congestion control for meshes

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
ROMM routing on mesh and torus networks

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The interaction of parallel and sequential workloads on a network of workstations

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Multicast virtual topologies for collective communication in MPCs and ATM clusters

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
NIFDY: a low overhead, high throughput network interface

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimizing memory system performance for communication in parallel computers

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Empirical evaluation of the CRAY-T3D: a compiler perspective

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Petri net modeling of interconnection networks for massively parallel architectures

ICS '95 Proceedings of the 9th international conference on Supercomputing
Performance evaluation of a parallel I/O architecture

ICS '95 Proceedings of the 9th international conference on Supercomputing
Randomized Routing with Shorter Paths

IEEE Transactions on Parallel and Distributed Systems
Coherent network interfaces for fine-grain communication

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Early experience with message-passing on the SHRIMP multicomputer

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of architectural support for global address-based communication in large-scale parallel machines

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improved methods for hiding latency in high bandwidth networks (extended abstract)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
On the benefit of supporting virtual channels in wormhole routers

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
On trading task reallocation for thread management in partitionable multiprocessors

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Synchronization hardware for networks of workstations: performance vs. cost

ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic methods for hiding latency in high bandwidth networks (extended abstract)

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Modeling cost/performance of a parallel computer simulator

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Compressionless Routing: A Framework for Adaptive and Fault-Tolerant Routing

IEEE Transactions on Parallel and Distributed Systems
Universal Wormhole Routing

IEEE Transactions on Parallel and Distributed Systems
Performance of Multistage Bus Networks for a Distributed Shared Memory Multiprocessor

IEEE Transactions on Parallel and Distributed Systems
Performance Evaluation of Switch-Based Wormhole Networks

IEEE Transactions on Parallel and Distributed Systems
Ace: linguistic mechanisms for customizable protocols

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact

Proceedings of the 24th annual international symposium on Computer architecture
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
Barrier inference

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
Adapting the Network Interface for High-Performance Computing: The CNI Approach

The Journal of Supercomputing - Special issue: high performance distributed computing
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks

IEEE Transactions on Parallel and Distributed Systems
Virtual memory mapped network interface for the SHRIMP multicomputer

25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory

25 years of the international symposia on Computer architecture (selected papers)
A quantitative comparison of parallel computation models

ACM Transactions on Computer Systems (TOCS)
Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding

IEEE Transactions on Parallel and Distributed Systems
Wormhole routing techniques for directly connected multicomputer systems

ACM Computing Surveys (CSUR)
Realizing Common Communication Patterns in Partitioned Optical Passive Stars (POPS) Networks

IEEE Transactions on Computers
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Design challenges of virtual networks: fast, general-purpose communication

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Broadcasting Multiple Messages in the Multiport Model

IEEE Transactions on Parallel and Distributed Systems
The QRQW PRAM: accounting for contention in parallel algorithms

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Workload Execution Strategies and Parallel Speedup on Clustered Computers

IEEE Transactions on Computers
On Interaction between Interconnection Network Design and Latency Hiding Techniques in Multiprocessors

The Journal of Supercomputing
Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and Their Impact

IEEE Transactions on Parallel and Distributed Systems
Optimistic active messages: structuring systems for high-performance communication

EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
Integrated Network Barriers

IEEE Transactions on Parallel and Distributed Systems
Integrated Performance Models for SPMD Applications and MIMD Architectures

IEEE Transactions on Parallel and Distributed Systems
Modeling of interconnection subsystems for massively parallel computers

Performance Evaluation
Parallel FFT on ATM-based networks of workstations

Cluster Computing
Optimal software multicast in wormhole-routed multistage networks

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Exploiting Redundancy to Speed Up Parallel Systems

IEEE Parallel & Distributed Technology: Systems & Technology
Parallel I/O Subsystems in Massively Parallel Supercomputers

IEEE Parallel & Distributed Technology: Systems & Technology
Inside Parallel Computers: Trends in Interconnection Networks

IEEE Computational Science & Engineering
An Efficient, Protected Message Interface

Computer
Virtual-Memory-Mapped Network Interfaces

IEEE Micro
Computing Global Combine Operations in the Multiport Postal Model

IEEE Transactions on Parallel and Distributed Systems
Optimal Software Multicast in Wormhole-Routed Multistage Networks

IEEE Transactions on Parallel and Distributed Systems
Integrated Performance Models for SPMD Applications and MIMD Architectures

IEEE Transactions on Parallel and Distributed Systems
On Message.Dependent Deadlocks in Multiprocessor/Multicomputer Systems

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Network Performance under Physical Constraints

ICPP '97 Proceedings of the international Conference on Parallel Processing
An Improved Analytical Model for Wormhole Routed Networks with Application to Butterfly Fat-Trees

ICPP '97 Proceedings of the international Conference on Parallel Processing
Broadcasting Multiple Messages in the Multiport Model

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Adaptive Source Routing in Multistage Interconnection Networks

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Partitionability of the Multistage Interconnection Networks

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Dag-Consistent Distributed Shared Memory

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Hybrid Time Synchronization Implemented Through Special Ring Array for Mesh or Torus

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
k -ary n -trees: High Performance Networks for Massively Parallel Architectures

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Reliable Hardware Barrier Synchronization Scheme

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Deadlock-Free Fault-tolerant Routing in the Multi-dimensional Crossbar Network and Its Implementation for the Hitachi SR2201

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Separated high-bandwidth and low-latency communication in the cluster interconnect Clint

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Exploitation of parallelism in group probing for testing massively parallel processing systems

ATS '95 Proceedings of the 4th Asian Test Symposium
Area-Universal Circuits with Constant Slowdown

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Toward high communication performance through compiled communications on a circuit switched interconnection network

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Protected, user-level DMA for the SHRIMP network interface

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
CNI: A High-Performance Network Interface for Workstation Clusters

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
An Augmented k-ary Tree Multiprocessor with Real-Time Fault-Tolerant Capability

The Journal of Supercomputing
Distributed Resolution of Network Congestion and Potential Deadlock Using Reservation-Based Scheduling

IEEE Transactions on Parallel and Distributed Systems
Fast synchronization for chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A defect tolerant self-organizing nanoscale SIMD architecture

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Level-wise scheduling algorithm for fat tree interconnection networks

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Kernel support for the Wisconsin wind tunnel

moas'93 USENIX Symposium on USENIX Microkernels and Other Kernel Architectures Symposium - Volume 4
Supporting tasks with adaptive groups in data parallel programming

International Journal of Computational Science and Engineering
Polymorphic On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A scalable, commodity data center network architecture

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Area-time tradeoffs for universal VLSI circuits

Theoretical Computer Science
Exploring pattern-aware routing in generalized fat tree networks

Proceedings of the 23rd international conference on Supercomputing
Separated high-bandwidth and low-latency communication in the cluster interconnect clint

Separated high-bandwidth and low-latency communication in the cluster interconnect clint
Reducing complexity in tree-like computer interconnection networks

Parallel Computing
A low-power fat tree-based optical network-on-chip for multiprocessor system-on-chip

Proceedings of the Conference on Design, Automation and Test in Europe
TLSync: support for multiple fast barriers using on-chip transmission lines

Proceedings of the 38th annual international symposium on Computer architecture
Hardware support for OpenMP collective operations

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
SGL: towards a bridging model for heterogeneous hierarchical platforms

International Journal of High Performance Computing and Networking
Bandwidth-optimal all-to-all exchanges in fat tree networks

Proceedings of the 27th international ACM conference on International conference on supercomputing
Fast pattern-specific routing for fat tree networks

ACM Transactions on Architecture and Code Optimization (TACO)
All routes to efficient datacenter fabrics

Proceedings of the 8th International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
Parallel Algorithms for the Circuit Value Update Problem

Theory of Computing Systems

Quantified Score

Hi-index	0.04

The network architecture of the Connection Machine CM-5 (extended abstract)

Quantified Score

Visualization

Abstract