The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

Authors:
A. Gottlieb;R. Grishman;C. P. Kruskal;K. P. McAuliffe;L. Rudolph;M. Snir
Affiliations:
Courant Institute of Mathematical Sciences, New York University;-;-;-;-;-
Venue:
IEEE Transactions on Computers
Year:
1983

Citing 20
Cited 48

Expected Length of the Longest Probe Sequence in Hash Code Searching

Journal of the ACM (JACM)
Ultracomputers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Synchronization with eventcounts and sequencers

Communications of the ACM
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Routing, merging and sorting on parallel models of computation

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
Banyan networks for partitioning multiprocessor systems

ISCA '73 Proceedings of the 1st annual symposium on Computer architecture
A large scale, homogeneous, fully distributed parallel machine, I

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Coordinating parallel processors: a partial unification

ACM SIGARCH Computer Architecture News
Software structures for ultraparallel computing

Software structures for ultraparallel computing
Upper and lower bounds on the performance of parallel algorithms

Upper and lower bounds on the performance of parallel algorithms
Cache-based Computer Systems

Computer
Analysis and Simulation of Buffered Delta Networks

IEEE Transactions on Computers
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor

IEEE Transactions on Computers
Networks and Algorithms for Very-Large-Scale Parallel Computation

Computer
Using the Augmented Data Manipulator Network in PASM

Computer
1 Tutorial Series Perspectives on Large-Scale Scientific Computation

Computer
Cm*: a modular, multi-microprocessor

AFIPS '77 Proceedings of the June 13-16, 1977, national computer conference

On input/output speedup in tightly coupled multiprocessors

IEEE Transactions on Computers - The MIT Press scientific computation series
Performance of unbuffered shuffle-exchange networks

IEEE Transactions on Computers - The MIT Press scientific computation series
Path hierarchies in interconnection networks

IBM Journal of Research and Development
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor Systems

IEEE Transactions on Computers
Traffic-Specific Interconnection Networks for Multicomputers

IEEE Transactions on Computers
New Connectivity and MSF Algorithms for Shuffle-Exchange Network and PRAM

IEEE Transactions on Computers
A Partitioning Strategy for Nonuniform Problems on Multiprocessors

IEEE Transactions on Computers
Parallelization and Performance Analysis of the Cooley-Tukey FFT Algorithm for Shared-Memory Architectures

IEEE Transactions on Computers
Performance analysis of the FFT algorithm on a shared-memory parallel architecture

IBM Journal of Research and Development
Applications considerations in the system design of highly concurrent multiprocessors

IEEE Transactions on Computers
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Compiler algorithms for synchronization

IEEE Transactions on Computers
Fault-tolerant routing in MIN-based supercomputers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Implementing the Data Diffusion Machine Using Crossbar Routers

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Kiloprocessor Extensions to SCI

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Reliable Hardware Barrier Synchronization Scheme

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
ClusterNet: An Object-Oriented Cluster Network

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Fusion of Concurrent Invocations of Exclusive Methods

PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Performance of MP3D on the SB-PRAM Prototype (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Real PRAM Programming

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Highly Concurrent Locking in Shared Memory Database Systems

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
The Stereo Correspondence Problem on a Ring-based Network

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Extracting Parallelism in Nested Loops

COMPSAC '96 Proceedings of the 20th Conference on Computer Software and Applications
Gracefully Degrading Systems Using the Bulk-Synchronous Parallel Model with Randomised Shared Memory

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
A Case for Aggregate Networks

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Fast synchronization on shared-memory multiprocessors: An architectural approach

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols

Journal of Parallel and Distributed Computing
Multistage Interconnection Networks with Multiple Outlets

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Performance and Reliability of the Multistage Bus Network

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
The Performance of Multistage Interconnection Networks for Multiprocessors

IEEE Transactions on Computers
Scaling performance of interior-point method on large-scale chip multiprocessor system

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Fpga-based prototype of a pram-on-chip processor

Proceedings of the 5th conference on Computing frontiers
Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Combinable memory-block transactions

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing

Proceedings of the 45th annual Design Automation Conference
Mesh-of-trees and alternative interconnection networks for single-chip parallelism

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Paper: Petri net performance modeling of a modified mesh-connected parallel computer

Parallel Computing
Paper: Deadlock detection without wait-for graphs

Parallel Computing
Using simple abstraction to reinvent computing for parallelism

Communications of the ACM
Database Applications of the FETCH-AND-ADD Instruction

IEEE Transactions on Computers
Lock-Free parallel algorithms: an experimental study

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Hardware support for OpenMP collective operations

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Interconnection network front-end controller combining to reduce hot spots effects

Computer Communications
An optimal parallel prefix-sums algorithm on the memory machine models for GPUs

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms

Proceedings of the Conference on Design, Automation and Test in Europe
Reducing contention through priority updates

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	15.03

Visualization

Abstract

We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements. This machine uses an enhanced message switching network with the geometry of an Omega-network to approximate the ideal behavior of Schwartz's paracomputer model of computation and to implement efficiently the important fetch-and-add synchronization primitive. We outine the hardware that would be required to build a 4096 processor system using 1990's technology. We also discuss system software issues, and present analytic studies of the network performance. Finally, we include a sample of our effort to implement and simulate parallel variants of important scientific p`rograms.