The shared-thread multiprocessor

Authors:
Jeffery A. Brown;Dean M. Tullsen
Affiliations:
UC San Diego, La Jolla, CA, USA;UC San Diego, La Jolla, CA, USA
Venue:
Proceedings of the 22nd annual international conference on Supercomputing
Year:
2008

Citing 15
Cited 9

Executing a Program on the MIT Tagged-Token Dataflow Architecture

IEEE Transactions on Computers
Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors

Journal of Parallel and Distributed Computing
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Implementing a cache consistency protocol

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Handling long-latency loads in a simultaneous multithreading processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Hyperthreading Technology in the Netburst Microarchitecture

IEEE Micro
Data flow languages and architectures

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Performance implications of single thread migration on a chip multi-core

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
An 8-core, 64-thread, 64-bit power efficient sparc soc (niagara2)

Proceedings of the 2007 international symposium on Physical design
Chip multiprocessor based on data-driven multithreading model

International Journal of High Performance Systems Architecture

Fast switching of threads between cores

ACM SIGOPS Operating Systems Review
OS execution on multi-cores: is out-sourcing worthwhile?

ACM SIGOPS Operating Systems Review
Proposition for a sequential accelerator in future general-purpose manycore processors and the problem of migration-induced cache misses

Proceedings of the 7th ACM international conference on Computing frontiers
Software data spreading: leveraging distributed caches to improve single thread performance

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture
SWEL: hardware cache coherence protocols to map shared data onto shared caches

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallelism and data movement characterization of contemporary application classes

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors

ACM Transactions on Computer Systems (TOCS)
Improving server performance on multi-cores via selective off-loading of OS functionality

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes initial results for an architecture called the Shared-Thread Multiprocessor (STMP). The STMP combines features of a multithreaded processor and a chip multiprocessor; specifically, it enables distinct cores on a chip multiprocessor to share thread state. This shared thread state allows the system to schedule threads from a shared pool onto individual cores, allowing for rapid movement of threads between cores. This paper demonstrates and evaluates three benefits of this architecture: (1) By providing more thread state storage than available in the cores themselves, the architecture enjoys the ILP benefits of many threads, but carries the in-core complexity of supporting just a few. (2) Threads can move between cores fast enough to hide long-latency events such as memory accesses. This enables very-short-term load balancing in response to such events. (3) The system can redistribute threads to maximize symbiotic behavior and balance load much more often than traditional operating system thread scheduling and context switching.