Applications considerations in the system design of highly concurrent multiprocessors

Authors:
S. F. Lundstrom
Affiliations:
Parallel Technology, Stanford, CA
Venue:
IEEE Transactions on Computers
Year:
1987

Citing 10
Cited 9

Efficient synchronization of multiprocessors with shared memory

PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
Structuring parallel algorithms in an MIMD, shared memory environment

Parallel Computing
Some issues in parallel processing as encountered on the Denelcor HEP

Parallel Computing
A decentralized control, highly concurrent multiprocesssor

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Ultracomputers

ACM Transactions on Programming Languages and Systems (TOPLAS)
A large scale, homogeneous, fully distributed parallel machine, I

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor

IEEE Transactions on Computers
Design and Validation of a Connection Network for Many-Processor Multiprocessor Systems

Computer
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference

Completing an MIMD multiprocessor taxonomy

ACM SIGARCH Computer Architecture News
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A two-tier memory architecture for high-performance multiprocessor systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Subset barrier synchronization on a private-memory parallel system

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters

IEEE Transactions on Parallel and Distributed Systems
Restricted Fetch and Φ operations for parallel processing

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Loop Coalescing and Scheduling for Barrier MIMD Architectures

IEEE Transactions on Parallel and Distributed Systems
Scalable barrier synchronisation for large-scale shared-memory multiprocessors

International Journal of High Performance Computing and Networking

Quantified Score

Hi-index	14.98

Visualization

Abstract

A five-year series of studies, which ended in 1982 and which was supported in part by NASA and in part by Burroughs Corporation, led to the system design of a very large, very high-speed multiprocessor. This system was intended to solve large scientific problems, especially modeling problems such as those in computational aerodynamics. The performance objective was to sustain execution rates up to one billion floating-point operations per second with problems requiring 40 million words of main memory. The viability of this design depended on an in-depth understanding of the projected applications of the system. An overview of the project objectives and the resulting 128 processor design will be presented showing the local private memories available to each processor, the 64 million word shared memory, the dual-omega interconnection network, and the important programming concepts. During the design of the system, studies were conducted which determined the number of processors (a tradeoff with individual processor speed), the memory organization (program and data, private and shared), and the structure of the networks used to interconnect the processor and memory resources. These studies and the important application-related considerations are presented. Although this system was never constructed and tested, it was extensively simulated and the design was completed to sufficient detail to develop a reasonably accurate parts list and implementation plan.