THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR

Authors:
A. Agarwal;D. Chaiken;K. Johnson;D. Kranz;J. Kubiatowicz;K. Kurihara;B. H. Lim;G. Maa;D. Nussbaum;M. Parkin;D. Yeung
Affiliations:
-;-;-;-;-;-;-;-;-;-;-
Venue:
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
Year:
1991

Citing 0
Cited 47

The impact of communication locality on large-scale multiprocessor performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Low contention load balancing on large-scale multiprocessors

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
A tightly-coupled processor-network interface

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Closing the window of vulnerability in multiphase memory transactions

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Waiting algorithms for synchronization in large-scale multiprocessors

ACM Transactions on Computer Systems (TOCS)
Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Computation migration: enhancing locality for distributed-memory parallel systems

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Integrating message-passing and shared-memory: early experience

ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
The NuMesh: a modular, scalable communications substrate

ICS '93 Proceedings of the 7th international conference on Supercomputing
Super-threading: architectural and software mechanisms for optimizing parallel computation

ICS '93 Proceedings of the 7th international conference on Supercomputing
T: integrated building blocks for parallel computing

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Diffracting trees (preliminary version)

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Counting networks

Journal of the ACM (JACM)
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reactive synchronization algorithms for multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Integration of message passing and shared memory in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Software caching and computation migration in Olden

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
ROMM routing on mesh and torus networks

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Software transactional memory

Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
COMA: an opportunity for building fault-tolerant scalable shared memory multiprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Diffracting trees

ACM Transactions on Computer Systems (TOCS)
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A steady state analysis of diffracting trees (extended abstract)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Counting networks are practically linearizable

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Compressionless Routing: A Framework for Adaptive and Fault-Tolerant Routing

IEEE Transactions on Parallel and Distributed Systems
Reactive diffracting trees

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
A Cost and Speed Model for k-ary n-Cube Wormhole Routers

IEEE Transactions on Parallel and Distributed Systems
Combining funnels: a new twist on an old tale…

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Proceedings of the 25th annual international symposium on Computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer

25 years of the international symposia on Computer architecture (selected papers)
The Stanford FLASH multiprocessor

25 years of the international symposia on Computer architecture (selected papers)
Scalable concurrent priority queue algorithms

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures

IEEE Transactions on Parallel and Distributed Systems
Generalized multiprocessor scheduling for directed acylic graphs

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Tolerating node failures in cache only memory architectures

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory

IEEE Transactions on Parallel and Distributed Systems
Software Techniques for Improving MPP Bulk-Transfer Performance

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
The Design and Simulation of the PACE Prototype Architecture

MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Comparative Study of Parallel vs. Distributed Genetic Algorithm Implementation for ATM Networking Environment

ISCC '00 Proceedings of the Fifth IEEE Symposium on Computers and Communications (ISCC 2000)
Modeling and evaluating the time overhead induced by BER in COMA multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Panda: a portable platform to support parallel programming languages

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Alewife multiprocessor project focuses on the architecture and design of a large-scale parallel machine. The machine uses a low dimension direct interconnection network to provide scalable communication bandwidth, while allowing the exploitation of locality. Despite its distributed memory architecture, Alewife allows efficient shared memory programming through a multilayered approach to locality management. A new scalable cache coherence scheme called LimitLESS directories allows the use of caches for reducing communication latency and network bandwidth requirements. Alewife also employs run-time and compile-time methods for partitioning and placement of data and processes to enhance communication locality. While the above methods attempt to minimize communication latency, remote communication with distant processors cannot be completely avoided. Alewife''s processor, Sparcle, is designed to tolerate these latencies by rapidly switching between threads of computation. This paper describes the Alewife architecture and concentrates on the novel hardware features of the machine including LimitLESS directories and the rapid context switching processor.