The J-machine multicomputer: an architectural evaluation

Authors:
Michael D. Noakes;Deborah A. Wallach;William J. Dally
Affiliations:
-;-;-
Venue:
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Year:
1993

Citing 9
Cited 87

Performance Analysis of k-ary n-cube Interconnection Networks

IEEE Transactions on Computers
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms

IEEE Micro
Limits on Interconnection Network Performance

IEEE Transactions on Parallel and Distributed Systems
A Mechanism for Efficient Context Switching

ICCD '91 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
The incremental garbage collection of processes

Proceedings of the 1977 symposium on Artificial intelligence and programming languages
A Concurrent Smalltalk Compiler for the Message-Driven Processor

A Concurrent Smalltalk Compiler for the Message-Driven Processor

Issues and directions in scalable parallel computing

PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Fault-tolerant wormhole routing in tori

ICS '94 Proceedings of the 8th international conference on Supercomputing
Increasing network bandwidth on meshes

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Universal congestion control for meshes

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
ROMM routing on mesh and torus networks

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The EM-X parallel computer: architecture and basic performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Configurable flow control mechanisms for fault-tolerant routing

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A partitioning-independent paradigm for nested data parallelism

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
A Framework for Designing Deadlock-Free Wormhole Routing Algorithms

IEEE Transactions on Parallel and Distributed Systems
Polling watchdog: combining polling and interrupts for efficient message handling

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements

IEEE Transactions on Parallel and Distributed Systems
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
On the benefit of supporting virtual channels in wormhole routers

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A Broadcast Algorithm for All-Port Wormhole-Routed Torus Networks

IEEE Transactions on Parallel and Distributed Systems
Scheduler-conscious synchronization

ACM Transactions on Computer Systems (TOCS)
The Performance of the Cedar Multistage Switching Network

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Buffering Schemes in Wormhole Routers

IEEE Transactions on Computers
Resource Placement in Torus-Based Networks

IEEE Transactions on Computers
A Fully Adaptive Routing Algorithm for Dynamically Injured Hypercubes, Meshes, and Tori

IEEE Transactions on Parallel and Distributed Systems
Design choices in the SHRIMP system: an empirical study

Proceedings of the 25th annual international symposium on Computer architecture
Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

IEEE Transactions on Parallel and Distributed Systems
Pc-based Shared Memory Architecture and Language

The Journal of Supercomputing
Retrospective: the J-machine

25 years of the international symposia on Computer architecture (selected papers)
The Stanford FLASH multiprocessor

25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance

25 years of the international symposia on Computer architecture (selected papers)
Wormhole routing techniques for directly connected multicomputer systems

ACM Computing Surveys (CSUR)
Dynamically Configurable Message Flow Control for Fault-Tolerant Routing

IEEE Transactions on Parallel and Distributed Systems
Exploiting ILP in page-based intelligent memory

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Adaptive-Trail Routing and Performance Evaluation in Irregular Networks Using Cut-Through Switches

IEEE Transactions on Parallel and Distributed Systems
Submesh Determination in Faulty Tori and Meshes

IEEE Transactions on Parallel and Distributed Systems
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
A lightweight idempotent messaging protocol for faulty networks

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
The performance of the cedar multistage switching network

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Fault-tolerant routing with non-adaptive wormhole algorithms in mesh networks

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Maté: a tiny virtual machine for sensor networks

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Hypermeshes: implementation and performance

Journal of Systems Architecture: the EUROMICRO Journal
A Case for Intelligent RAM

IEEE Micro
Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks

IEEE Transactions on Computers
Communication in Multicomputers with Nonconvex Faults

IEEE Transactions on Computers
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing

IEEE Transactions on Parallel and Distributed Systems
Resource Placement in Torus-Based Networks

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Comparative Analysis of Adaptive Wormhole Routing in Tori and Hypercubes in the Presence of Hotspot Traffic

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Processor Allocation in the Mesh Multiprocessors Using the Leapfrog Method

IEEE Transactions on Parallel and Distributed Systems
VLSI Architecture: Past, Present, and Future

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Spare processor allocation for fault tolerance in torus-based multicomputers

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Active I/O Switches in System Area Networks

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Reducing Cost and Tolerating Defects in Page-based Intelligent Memory

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Design and implementation of a multicomputer interconnection network using FPGAs

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Resource Placements in 2D Tori

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor

IEEE Transactions on Computers
On resource placements in 3D tori

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Optical transpose k-ary n-cube networks

Journal of Systems Architecture: the EUROMICRO Journal
Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A low cost, multithreaded processing-in-memory system

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Microarchitecture of a High-Radix Router

Proceedings of the 32nd annual international symposium on Computer Architecture
Hardware-modulated parallelism in chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A plane-based broadcast algorithm for multicomputer networks

Journal of Systems Architecture: the EUROMICRO Journal
Designing Large Hierarchical Multiprocessor Systems under Processor, Interconnection, and Packaging Advancements

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
A queueing model for predicting message latency in uni-directional k-ary n-cubes with deterministic routing and non-uniform traffic

Cluster Computing
An accurate performance model of fully adaptive routing in wormhole-switched two-dimensional mesh multicomputers

Microprocessors & Microsystems
A new approach to model virtual channels in interconnection networks

Journal of Computer and System Sciences
TriBA: a novel scalable architecture for high performance parallel computing applications

ACOS'07 Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6
Combinatorial performance modelling of toroidal cubes

Journal of Systems Architecture: the EUROMICRO Journal
A new general method to compute virtual channels occupancy probabilities in wormhole networks

Journal of Computer and System Sciences
Servo: a programming model for many-core computing

ACM SIGARCH Computer Architecture News
Design and performance evaluation of combined first-fit task allocation and migration strategies in mesh multiprocessor systems

Parallel Computing
Performance analysis of deterministically-routed bi-directional torus with non-uniform traffic distribution

Future Generation Computer Systems
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation

Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Resource placement in three-dimensional tori

Parallel Computing
Using a configurable processor generator for computer architecture prototyping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A dynamic programming algorithm for simulation of a multi-dimensional torus in a crossed cube

Information Sciences: an International Journal
Augmented k-ary n-cubes

Information Sciences: an International Journal
Parallel algorithms for finding polynomial Roots on OTIS-torus

The Journal of Supercomputing
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
On the probability distribution of busy virtual channels

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The elephant and the mice: the role of non-strict fine-grain synchronization for modern many-core architectures

Proceedings of the international conference on Supercomputing
One-to-one disjoint path covers on k-ary n-cubes

Theoretical Computer Science
Upper bounds on the connection probability for 2-D meshes and tori

Journal of Parallel and Distributed Computing
Configurable fine-grain protection for multicore processor virtualization

Proceedings of the 39th Annual International Symposium on Computer Architecture
EventWave: programming model and runtime support for tightly-coupled elastic cloud applications

Proceedings of the 4th annual Symposium on Cloud Computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

The MIT J-Machine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Each J-Machine node consists of an integrated multicomputer component, the Message-Driven Processor (MDP), and 1 MByte of DRAM. The MDP provides mechanisms to support efficient communication, synchronization, and naming. A 512 node J-Machine is operational and is due to be expanded to 1024 nodes in March 1993. In this paper we discuss the design of the J-Machine and evaluate the effectiveness of the mechanisms incorporated into the MDP. We measure the performance of the communication and synchronization mechanisms directly and investigate the behavior of four complete applications.