Array processor with multiple broadcasting
Journal of Parallel and Distributed Computing
VLSI assist for a multiprocessor
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Coherency for multiprocessor virtual address caches
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Applications considerations in the system design of highly concurrent multiprocessors
IEEE Transactions on Computers
A mean-value performance analysis of a new multiprocessor architecture
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Software structures for ultraparallel computing
Software structures for ultraparallel computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Reference history, page size, and migration daemons in local/remote architectures
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Inexpensive implementations of set-associativity
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Organization and performance of a two-level virtual-real cache hierarchy
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Introducing memory into the switch elements of multiprocessor interconnection networks
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Multiple vs. wide shared bus multiprocessors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
C2MP: a cache-coherent, distributed memory multiprocessor-system
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Multicast tree construction in bus-based networks
Communications of the ACM
Analysis of critical architectural and programming parameters in a hierarchical
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Computer
Paradigm: A Highly Scalable Shared-Memory Multicomputer Architecture
Computer - Special issue on cryptography
Hector: A Hierarchically Structured Shared-Memory Multiprocessor
Computer - Special issue on experimental research in computer architecture
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Processor-pool-based scheduling for large-scale NUMA multiprocessors
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Willow: a scalable shared memory multiprocessor
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A scalable snoopy coherence scheme on distributed shared-memory multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
ICS '93 Proceedings of the 7th international conference on Supercomputing
A checkpoint protocol for an entry consistent shared memory system
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Boosting the performance of hybrid snooping cache protocols
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Techniques for reducing overheads of shared-memory multiprocessing
ICS '95 Proceedings of the 9th international conference on Supercomputing
An analytical model of high performance superscalar-based multiprocessors
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Fast Gossiping on Mesh-Bus Computers
IEEE Transactions on Computers
Systematic Design of Fault-Tolerant Multiprocessors with Shared Buses
IEEE Transactions on Computers
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
A study of three dynamic approaches to handle widely shared data in shared-memory multiprocessors
ICS '98 Proceedings of the 12th international conference on Supercomputing
An empirical evaluation of two memory-efficient directory methods
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An asynchronous protocol for release consistent distributed shared memory systems
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Design of an Adaptive Cache Coherence Protocol for Large Scale Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A Virtual Bus Architecture for Dynamic Parallel Processing
IEEE Transactions on Parallel and Distributed Systems
Performance of Pruning-Cache Directories for Large-Scale Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Connective Fault Tolerance in Multiple-Bus Systems
IEEE Transactions on Parallel and Distributed Systems
Two techniques for improving performance on bus-based multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Two Adaptive Hybrid Cache Coherency Protocols
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Scalable Interconnection Network Architecture for Petaflops Computing
The Journal of Supercomputing
Coupling compiler-enabled and conventional memory accessing for energy efficiency
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Using continuations to build a user-level threads library
MSYM'93 Proceedings of the 3rd conference on USENIX MACH III Symposium - Volume 1
Leveraging on-chip networks for data cache migration in chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Transactional conflict decoupling and value prediction
Proceedings of the international conference on Supercomputing
Edge chasing delayed consistency: pushing the limits of weak memory models
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Hi-index | 0.03 |
The Wisconsin Multicube, is a large-scale, shared-memory multiprocessor architecture that employs a snooping cache protocol over a grid of buses. Each processor has a conventional (SRAM) cache optimized to minimize memory latency and a large (DRAM) snooping cache optimized to reduce bus traffic and to maintain consistency. The large snooping cache should guarantee that nearly all the traffic on the buses will be generated by I/O and accesses to shared data.The programmer's view of the system is like a multi -- a set of processors having access to a common shared memory with no notion of geographical locality. Thus writing software, including the operating system, should be a straightforward extension of those techniques being developed for multis.The interconnection topology allows for a cache-coherent protocol for which most bus requests can be satisfied with no more than twice the number of bus operations required of a single-bus multi. The total symmetry guarantees that there are no topology-induced bottlenecks. The total bus bandwidth grows in proportion to the product of the number of processors and the average path length.The proposed architecture is an example of a new class of interconnection topologies -- the Multicube -- which consists of N =nk processors, where each processor is connected to k buses and each bus is connected to n processors. The hypercube is a special case where n=2. The Wisconsin Multicube is a two-dimensional Multicube (k=2), where n scales to about 32, resulting in a proposed system of over 1,000 processors.