Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
A Mechanism for Efficient Context Switching
ICCD '91 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
The incremental garbage collection of processes
Proceedings of the 1977 symposium on Artificial intelligence and programming languages
A Concurrent Smalltalk Compiler for the Message-Driven Processor
A Concurrent Smalltalk Compiler for the Message-Driven Processor
Issues and directions in scalable parallel computing
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Fault-tolerant wormhole routing in tori
ICS '94 Proceedings of the 8th international conference on Supercomputing
Increasing network bandwidth on meshes
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Universal congestion control for meshes
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
ROMM routing on mesh and torus networks
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The EM-X parallel computer: architecture and basic performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Configurable flow control mechanisms for fault-tolerant routing
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A partitioning-independent paradigm for nested data parallelism
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Proceedings of the 28th annual international symposium on Microarchitecture
A Framework for Designing Deadlock-Free Wormhole Routing Algorithms
IEEE Transactions on Parallel and Distributed Systems
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements
IEEE Transactions on Parallel and Distributed Systems
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
On the benefit of supporting virtual channels in wormhole routers
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A Broadcast Algorithm for All-Port Wormhole-Routed Torus Networks
IEEE Transactions on Parallel and Distributed Systems
Scheduler-conscious synchronization
ACM Transactions on Computer Systems (TOCS)
The Performance of the Cedar Multistage Switching Network
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Buffering Schemes in Wormhole Routers
IEEE Transactions on Computers
Resource Placement in Torus-Based Networks
IEEE Transactions on Computers
A Fully Adaptive Routing Algorithm for Dynamically Injured Hypercubes, Meshes, and Tori
IEEE Transactions on Parallel and Distributed Systems
Design choices in the SHRIMP system: an empirical study
Proceedings of the 25th annual international symposium on Computer architecture
Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms
IEEE Transactions on Parallel and Distributed Systems
Pc-based Shared Memory Architecture and Language
The Journal of Supercomputing
25 years of the international symposia on Computer architecture (selected papers)
The Stanford FLASH multiprocessor
25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
Wormhole routing techniques for directly connected multicomputer systems
ACM Computing Surveys (CSUR)
Dynamically Configurable Message Flow Control for Fault-Tolerant Routing
IEEE Transactions on Parallel and Distributed Systems
Exploiting ILP in page-based intelligent memory
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Adaptive-Trail Routing and Performance Evaluation in Irregular Networks Using Cut-Through Switches
IEEE Transactions on Parallel and Distributed Systems
Submesh Determination in Faulty Tori and Meshes
IEEE Transactions on Parallel and Distributed Systems
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
A lightweight idempotent messaging protocol for faulty networks
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
The performance of the cedar multistage switching network
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Fault-tolerant routing with non-adaptive wormhole algorithms in mesh networks
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Maté: a tiny virtual machine for sensor networks
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Hypermeshes: implementation and performance
Journal of Systems Architecture: the EUROMICRO Journal
IEEE Micro
Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks
IEEE Transactions on Computers
Communication in Multicomputers with Nonconvex Faults
IEEE Transactions on Computers
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing
IEEE Transactions on Parallel and Distributed Systems
Resource Placement in Torus-Based Networks
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Processor Allocation in the Mesh Multiprocessors Using the Leapfrog Method
IEEE Transactions on Parallel and Distributed Systems
VLSI Architecture: Past, Present, and Future
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Spare processor allocation for fault tolerance in torus-based multicomputers
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Active I/O Switches in System Area Networks
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Reducing Cost and Tolerating Defects in Page-based Intelligent Memory
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Design and implementation of a multicomputer interconnection network using FPGAs
FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Resource Placements in 2D Tori
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor
IEEE Transactions on Computers
On resource placements in 3D tori
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Optical transpose k-ary n-cube networks
Journal of Systems Architecture: the EUROMICRO Journal
Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A low cost, multithreaded processing-in-memory system
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Microarchitecture of a High-Radix Router
Proceedings of the 32nd annual international symposium on Computer Architecture
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A plane-based broadcast algorithm for multicomputer networks
Journal of Systems Architecture: the EUROMICRO Journal
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Microprocessors & Microsystems
A new approach to model virtual channels in interconnection networks
Journal of Computer and System Sciences
TriBA: a novel scalable architecture for high performance parallel computing applications
ACOS'07 Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6
Combinatorial performance modelling of toroidal cubes
Journal of Systems Architecture: the EUROMICRO Journal
A new general method to compute virtual channels occupancy probabilities in wormhole networks
Journal of Computer and System Sciences
Servo: a programming model for many-core computing
ACM SIGARCH Computer Architecture News
Future Generation Computer Systems
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Resource placement in three-dimensional tori
Parallel Computing
Using a configurable processor generator for computer architecture prototyping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A dynamic programming algorithm for simulation of a multi-dimensional torus in a crossed cube
Information Sciences: an International Journal
Information Sciences: an International Journal
Parallel algorithms for finding polynomial Roots on OTIS-torus
The Journal of Supercomputing
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
On the probability distribution of busy virtual channels
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the international conference on Supercomputing
One-to-one disjoint path covers on k-ary n-cubes
Theoretical Computer Science
Upper bounds on the connection probability for 2-D meshes and tori
Journal of Parallel and Distributed Computing
Configurable fine-grain protection for multicore processor virtualization
Proceedings of the 39th Annual International Symposium on Computer Architecture
EventWave: programming model and runtime support for tightly-coupled elastic cloud applications
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.02 |
The MIT J-Machine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Each J-Machine node consists of an integrated multicomputer component, the Message-Driven Processor (MDP), and 1 MByte of DRAM. The MDP provides mechanisms to support efficient communication, synchronization, and naming. A 512 node J-Machine is operational and is due to be expanded to 1024 nodes in March 1993. In this paper we discuss the design of the J-Machine and evaluate the effectiveness of the mechanisms incorporated into the MDP. We measure the performance of the communication and synchronization mechanisms directly and investigate the behavior of four complete applications.