Hector: A Hierarchically Structured Shared-Memory Multiprocessor

Authors:
Zvonko G. Vranesic;Michael Stumm;David M. Lewis;Ron White
Affiliations:
Univ. of Toronto, Toronto, Ont., Canada;Univ. of Toronto, Toronto, Ont., Canada;Univ. of Toronto, Toronto, Ont., Canada;Univ. of Toronto, Toronto, Ont., Canada
Venue:
Computer - Special issue on experimental research in computer architecture
Year:
1991

Citing 8
Cited 36

Parallel processing: the Cm* experience

Parallel processing: the Cm* experience
Programming for Parallelism

Computer
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Evaluating the performance of software cache coherence

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The Monarch Parallel Processor Hardware Design

Computer
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Special Report: 1989 Gordon Bell Prize

IEEE Software

Processor-pool-based scheduling for large-scale NUMA multiprocessors

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Dynamic and static load scheduling performance on a NUMA shared memory multiprocessor

ICS '91 Proceedings of the 5th international conference on Supercomputing
Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor

IEEE Transactions on Software Engineering
Cache consistency in hierarchical-ring-based multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The shared regions approach to software cache coherence on multiprocessors

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The performance of cache-coherent ring-based multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Hot spot analysis in large scale shared memory multiprocessors

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Exploiting cache affinity in software cache coherence

ICS '94 Proceedings of the 8th international conference on Supercomputing
Parallel sorting by over partitioning

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Efficient shared memory with minimal hardware support

ACM SIGARCH Computer Architecture News
Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring Architectures

IEEE Transactions on Parallel and Distributed Systems
An analytic study of dynamic hardware and software cache coherence strategies

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
HFS: a performance-oriented flexible file system based on building-block compositions

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
HFS: a performance-oriented flexible file system based on building-block compositions

ACM Transactions on Computer Systems (TOCS)
Optimal Clustering of Hierarchical Hyper-Ring Multicomputers

The Journal of Supercomputing
Performance of the hyper-ring multicomputer

SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Dynamic Task Scheduling Using Online Optimization

IEEE Transactions on Parallel and Distributed Systems
Hierarchical Ring Network Configuration and Performance Modeling

IEEE Transactions on Computers
Compiler-based I/O prefetching for out-of-core applications

ACM Transactions on Computer Systems (TOCS)
Augmented Ring Networks

IEEE Transactions on Parallel and Distributed Systems
Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors

The Journal of Supercomputing
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors

IEEE Transactions on Computers
Performance Evaluation of the Slotted Ring Multiprocessor

IEEE Transactions on Computers
Performance and Configuration of Hierarchical Ring Networks for Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Integrating applications with cache and memory management on a shared-memory multiprocessor

CASCON '92 Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research - Volume 1
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Torus Ring: improving performance of interconnection network by modifying hierarchical ring

Parallel Computing
Comparison of Mesh and Hierarchical Networks for Multiprocessors

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Optimizing IPC Performance for Shared-Memory Multiprocessors

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Experiences with locking in a NUMA multiprocessor operating system kernel

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Experience distributing objects in an SMMP OS

ACM Transactions on Computer Systems (TOCS)
On the importance of parallel application placement in NUMA multiprocessors

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
NUMACROS: data parallel programming on NUMA multiprocessors

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4

Quantified Score

Hi-index	0.01

Visualization

Abstract

The architecture of the Hector multiprocessor, which exploits current microprocessor technology to produce a machine with a good cost/performance tradeoff, is described. A key design feature of Hector is its interconnection backplane, which can accommodate future technology because it uses simple hardware with short critical paths in logic circuits and short lines in the interconnection network. The system is reliable and flexible and can be realized at a relatively low cost. The hierarchical structure results in a fast backplane and a bandwidth that increases linearly with the number of processors. Hector scales efficiently to larger sizes and faster processors.