The Wisconsin multicube: a new large-scale cache-coherent multiprocessor

Authors:
J. R. Goodman;P. J. Woest
Affiliations:
Univ. of Wisconsin, Madison;Univ. of Wisconsin, Madison
Venue:
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Year:
1988

Citing 8
Cited 60

Array processor with multiple broadcasting

Journal of Parallel and Distributed Computing
VLSI assist for a multiprocessor

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Coherency for multiprocessor virtual address caches

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Applications considerations in the system design of highly concurrent multiprocessors

IEEE Transactions on Computers
A mean-value performance analysis of a new multiprocessor architecture

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Dynamic decentralized cache schemes for mimd parallel processors

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Software structures for ultraparallel computing

Software structures for ultraparallel computing

An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Reference history, page size, and migration daemons in local/remote architectures

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Simple but effective techniques for NUMA memory management

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Evaluating the performance of four snooping cache coherency protocols

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Inexpensive implementations of set-associativity

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Organization and performance of a two-level virtual-real cache hierarchy

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Introducing memory into the switch elements of multiprocessor interconnection networks

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Multiple vs. wide shared bus multiprocessors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
C2MP: a cache-coherent, distributed memory multiprocessor-system

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Multicast tree construction in bus-based networks

Communications of the ACM
A Survey of Cache Coherence Schemes for Multiprocessors

Computer
New directions in scalable shared-memory multiprocessor architectures

Computer
Analysis of critical architectural and programming parameters in a hierarchical

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Aquarius project

Computer
Paradigm: A Highly Scalable Shared-Memory Multicomputer Architecture

Computer - Special issue on cryptography
Hector: A Hierarchically Structured Shared-Memory Multiprocessor

Computer - Special issue on experimental research in computer architecture
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Processor-pool-based scheduling for large-scale NUMA multiprocessors

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Exploiting operating system support for dynamic page placement on a NUMA shared memory multiprocessor

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Experimental comparison of memory management policies for NUMA multiprocessors

ACM Transactions on Computer Systems (TOCS)
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
DDM: A Cache-Only Memory Architecture

Computer
Willow: a scalable shared memory multiprocessor

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A scalable snoopy coherence scheme on distributed shared-memory multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Lazy caching

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons

ACM Computing Surveys (CSUR)
Evaluating the communication performance of MPPs using synthetic sparse matrix multiplication workloads

ICS '93 Proceedings of the 7th international conference on Supercomputing
A checkpoint protocol for an entry consistent shared memory system

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
Boosting the performance of hybrid snooping cache protocols

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Architectural mechanisms for explicit communication in shared memory multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Techniques for reducing overheads of shared-memory multiprocessing

ICS '95 Proceedings of the 9th international conference on Supercomputing
An analytical model of high performance superscalar-based multiprocessors

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Coherent network interfaces for fine-grain communication

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Fast Gossiping on Mesh-Bus Computers

IEEE Transactions on Computers
Systematic Design of Fault-Tolerant Multiprocessors with Shared Buses

IEEE Transactions on Computers
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
A study of three dynamic approaches to handle widely shared data in shared-memory multiprocessors

ICS '98 Proceedings of the 12th international conference on Supercomputing
An empirical evaluation of two memory-efficient directory methods

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An asynchronous protocol for release consistent distributed shared memory systems

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Speculative Versioning Cache

IEEE Transactions on Parallel and Distributed Systems
Temporally silent stores

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Hardware Approaches to Cache Coherence in Shared-Memory Multiprocessors Part 2

IEEE Micro
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Design of an Adaptive Cache Coherence Protocol for Large Scale Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A Virtual Bus Architecture for Dynamic Parallel Processing

IEEE Transactions on Parallel and Distributed Systems
Performance of Pruning-Cache Directories for Large-Scale Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Connective Fault Tolerance in Multiple-Bus Systems

IEEE Transactions on Parallel and Distributed Systems
Two techniques for improving performance on bus-based multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Two Adaptive Hybrid Cache Coherency Protocols

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Scalable Interconnection Network Architecture for Petaflops Computing

The Journal of Supercomputing
Coupling compiler-enabled and conventional memory accessing for energy efficiency

ACM Transactions on Computer Systems (TOCS)
In-Network Cache Coherence

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Using continuations to build a user-level threads library

MSYM'93 Proceedings of the 3rd conference on USENIX MACH III Symposium - Volume 1
Leveraging on-chip networks for data cache migration in chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Paper: Petri net performance modeling of a modified mesh-connected parallel computer

Parallel Computing
Transactional conflict decoupling and value prediction

Proceedings of the international conference on Supercomputing
Edge chasing delayed consistency: pushing the limits of weak memory models

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability

Quantified Score

Hi-index	0.03

Visualization

Abstract

The Wisconsin Multicube, is a large-scale, shared-memory multiprocessor architecture that employs a snooping cache protocol over a grid of buses. Each processor has a conventional (SRAM) cache optimized to minimize memory latency and a large (DRAM) snooping cache optimized to reduce bus traffic and to maintain consistency. The large snooping cache should guarantee that nearly all the traffic on the buses will be generated by I/O and accesses to shared data.The programmer's view of the system is like a multi -- a set of processors having access to a common shared memory with no notion of geographical locality. Thus writing software, including the operating system, should be a straightforward extension of those techniques being developed for multis.The interconnection topology allows for a cache-coherent protocol for which most bus requests can be satisfied with no more than twice the number of bus operations required of a single-bus multi. The total symmetry guarantees that there are no topology-induced bottlenecks. The total bus bandwidth grows in proportion to the product of the number of processors and the average path length.The proposed architecture is an example of a new class of interconnection topologies -- the Multicube -- which consists of N =nk processors, where each processor is connected to k buses and each bus is connected to n processors. The hypercube is a special case where n=2. The Wisconsin Multicube is a two-dimensional Multicube (k=2), where n scales to about 32, resulting in a proposed system of over 1,000 processors.