The Stanford Dash Multiprocessor
Computer
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient strategies for software-only protocols in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 25th annual international symposium on Computer architecture
Improving prediction for procedure returns with return-address-stack repair mechanisms
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Difficult-path branch prediction using subordinate microthreads
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Spider: A High-Speed Network Interconnect
IEEE Micro
The Alpha 21264 Microprocessor
IEEE Micro
Active Memory Clusters: Efficient Multiprocessing on Commodity Clusters
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor
Proceedings of the 30th annual international symposium on Computer architecture
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
ACM SIGARCH Computer Architecture News
Architectural Support for Uniprocessor and Multiprocessor Active Memory Systems
IEEE Transactions on Computers
TMA: a trap-based memory architecture
Proceedings of the 20th annual international conference on Supercomputing
An embedded coherent-multithreading multimedia processor and its programming model
Proceedings of the 44th annual Design Automation Conference
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the Conference on Design, Automation and Test in Europe
Exploiting locality: a flexible DSM approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
We introduce the SMTp architecture-an SMT processoraugmented with a coherence protocol thread context,that together with a standard integrated memory controllercan enable the design of (among other possibilities) scalablecache-coherent hardware distributed shared memory(DSM) machines from commodity nodes. We describe theminor changes needed to a conventional out-of-order multi-threadedcore to realize SMTp, discussing issues related toboth deadlock avoidance and performance. We then compareSMTp performance to that of various conventionalDSM machines with normal SMT processors both with andwithout integrated memory controllers. On configurationsfrom 1 to 32 nodes, with 1 to 4 application threads pernode, we find that SMTp delivers performance comparableto, and sometimes better than, machines with more complexintegrated DSM-specific memory controllers. Our resultsalso show that the protocol thread has extremely lowpipeline overhead. Given the simplicity and the flexibility ofthe SMTp mechanism, we argue that next-generation multi-threadedprocessors with integrated memory controllersshould adopt this mechanism as a way of building less complexhigh-performance DSM multiprocessors.