Chip Multithreading: Opportunities and Challenges

Authors:
Lawrence Spracklen;Santosh G. Abraham
Affiliations:
Sun Microsystems Inc., Sunnyvale, CA;Sun Microsystems Inc., Sunnyvale, CA
Venue:
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Year:
2005

Citing 0
Cited 37

Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Exploring the cache design space for large scale CMPs

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Overlapping dependent loads with addressless preload

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Long-latency branches: how much do they matter?

ACM SIGARCH Computer Architecture News
RMIS: middleware for transparent object-oriented modeling in multi-simulator systems

WSC '05 Proceedings of the 37th conference on Winter simulation
The application kernel approach—a novel approach for adding SMP support to uniprocessor operating systems

Software—Practice & Experience
Supporting microthread scheduling and synchronisation in CMPs

International Journal of Parallel Programming
Fairness and Throughput in Switch on Event Multithreading

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fairness enforcement in switch on event multithreading

ACM Transactions on Architecture and Code Optimization (TACO)
Evaluating design tradeoffs in on-chip power management for CMPs

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
The shared-thread multiprocessor

Proceedings of the 22nd annual international conference on Supercomputing
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Improving error tolerance for multithreaded register files

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Evaluation of the SUN UltraSparc T2+ Processor for Computational Science

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Transparent multi-core cryptographic support on Niagara CMT Processors

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Real-time 3-D ultrasound scan conversion using a multicore processor

IEEE Transactions on Information Technology in Biomedicine - Special section on biomedical informatics
Finding representative workloads for computer system design

Finding representative workloads for computer system design
Data race: tame the beast

The Journal of Supercomputing
Scaling power/ground solvers on multi-core with memory bandwidth awareness

Proceedings of the 20th symposium on Great lakes symposium on VLSI
The core degree based tag reduction on chip multiprocessor to balance energy saving and performance overhead

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
SCMP architecture: an asymmetric multiprocessor system-on-chip for dynamic applications

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Evaluating OpenMP on chip multithreading platforms

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Coterminous locality and coterminous group data prefetching on chip-multiprocessors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Performance evaluation of a chip-multithreading server for high performance computing applications

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
PMA: Pixel-based multi-anchor algorithm for image recognition on multi-core systems

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Performance and power aware CMP thread allocation modeling

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Trends and challenges in operating systems---from parallel computing to cloud computing

Concurrency and Computation: Practice & Experience
Return data interleaving for multi-channel embedded CMPs systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management

Microprocessors & Microsystems
Accelerating sequential programs on commodity multi-core processors

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip Multi-Threaded (CMT) processors provide support for many simultaneous hardware threads of execution in various ways, including Simultaneous Multithreading (SMT) and Chip Multiprocessing (CMP). CMT processors are especially suited to server workloads, which generally have high levels of Thread-Level Parallelism (TLP). In this paper, we describe the evolution of CMT chips in industry and highlight the pervasiveness of CMT designs in upcoming general-purpose processors. The CMT design space accommodates a range of designs between the extremes represented by the SMT and CMP designs and a variety of attractive design options are currently unexplored. Though there has been extensive research on utilizing multiple hardware threads to speed up single-threaded applications via speculative parallelization, there are many challenges in designing CMT processors, even when sufficient TLP is present. This paper describes some of these challenges including, hot sets, hot banks, speculative prefetching strategies, request prioritization and off-chip bandwidth reduction.