Parallel processing: the Cm* experience
Parallel processing: the Cm* experience
Computer
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Evaluating the performance of software cache coherence
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Special Report: 1989 Gordon Bell Prize
IEEE Software
Processor-pool-based scheduling for large-scale NUMA multiprocessors
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Dynamic and static load scheduling performance on a NUMA shared memory multiprocessor
ICS '91 Proceedings of the 5th international conference on Supercomputing
Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor
IEEE Transactions on Software Engineering
Cache consistency in hierarchical-ring-based multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The shared regions approach to software cache coherence on multiprocessors
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The performance of cache-coherent ring-based multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Hot spot analysis in large scale shared memory multiprocessors
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Exploiting cache affinity in software cache coherence
ICS '94 Proceedings of the 8th international conference on Supercomputing
Parallel sorting by over partitioning
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Efficient shared memory with minimal hardware support
ACM SIGARCH Computer Architecture News
Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring Architectures
IEEE Transactions on Parallel and Distributed Systems
An analytic study of dynamic hardware and software cache coherence strategies
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
HFS: a performance-oriented flexible file system based on building-block compositions
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
HFS: a performance-oriented flexible file system based on building-block compositions
ACM Transactions on Computer Systems (TOCS)
Optimal Clustering of Hierarchical Hyper-Ring Multicomputers
The Journal of Supercomputing
Performance of the hyper-ring multicomputer
SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Dynamic Task Scheduling Using Online Optimization
IEEE Transactions on Parallel and Distributed Systems
Hierarchical Ring Network Configuration and Performance Modeling
IEEE Transactions on Computers
Compiler-based I/O prefetching for out-of-core applications
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Parallel and Distributed Systems
Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors
The Journal of Supercomputing
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors
IEEE Transactions on Computers
Performance Evaluation of the Slotted Ring Multiprocessor
IEEE Transactions on Computers
Performance and Configuration of Hierarchical Ring Networks for Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Integrating applications with cache and memory management on a shared-memory multiprocessor
CASCON '92 Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research - Volume 1
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Comparison of Mesh and Hierarchical Networks for Multiprocessors
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Optimizing IPC Performance for Shared-Memory Multiprocessors
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Experiences with locking in a NUMA multiprocessor operating system kernel
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Experience distributing objects in an SMMP OS
ACM Transactions on Computer Systems (TOCS)
On the importance of parallel application placement in NUMA multiprocessors
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
NUMACROS: data parallel programming on NUMA multiprocessors
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Hi-index | 0.01 |
The architecture of the Hector multiprocessor, which exploits current microprocessor technology to produce a machine with a good cost/performance tradeoff, is described. A key design feature of Hector is its interconnection backplane, which can accommodate future technology because it uses simple hardware with short critical paths in logic circuits and short lines in the interconnection network. The system is reliable and flexible and can be realized at a relatively low cost. The hierarchical structure results in a fast backplane and a bandwidth that increases linearly with the number of processors. Hector scales efficiently to larger sizes and faster processors.