Simultaneous Multithreading: A Platform for Next-Generation Processors

Authors:
Susan J. Eggers;Joel S. Emer;Henry M. Levy;Jack L. Lo;Rebecca L. Stamm;Dean M. Tullsen
Affiliations:
-;-;-;-;-;-
Venue:
IEEE Micro
Year:
1997

Citing 14
Cited 93

An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Performance Study of a Multithreaded Superscalar Microprocessor

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
An evaluation of staged run-time optimizations in DyC

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
An analysis of operating system behavior on a simultaneous multithreaded architecture

ACM SIGPLAN Notices
The benefits and costs of DyC's run-time optimizations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Analytical cache models with applications to cache partitioning

ICS '01 Proceedings of the 15th international conference on Supercomputing
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
SMT Layout Overhead and Scalability

IEEE Transactions on Parallel and Distributed Systems
Performance of a micro-threaded pipeline

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
A case for user-level interrupts

ACM SIGARCH Computer Architecture News
Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer

International Journal of Parallel Programming
Computer Systems Research: The Pressure Is On

Computer
Analysis of performance bottlenecks in multithreaded multiprocessor systems

Fundamenta Informaticae - Application of concurrency to system design
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Boosting SMT Performance by Speculation Control

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
COOL Approach to Petaflops Computing (invited paper)

PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
A Queuing Model of a Multi-threaded Architecture: A Case Study

PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Effects of Memory Performance on Parallel Job Scheduling

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Improving the Performance of Heterogeneous DSMs via Multithreading

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Dissecting Cyclops: a detailed analysis of a multithreaded architecture

ACM SIGARCH Computer Architecture News
Improving server software support for simultaneous multithreaded processors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Simultaneous Multithreading-Based Routers

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
The need for adaptive dynamic thread scheduling

High performance scientific and engineering computing
Fighting the memory wall with assisted execution

Proceedings of the 1st conference on Computing frontiers
A retrospective on: "an evaluation of staged run-time optimizations in DyC"

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Architectural Support for Enhanced SMT Job Scheduling

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Area and System Clock Effects on SMT/CMP Throughput

IEEE Transactions on Computers
Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Reducing coherence overhead and boosting performance of high-end SMP multiprocessors running a DSS workload

Journal of Parallel and Distributed Computing
Evaluating the impact of simultaneous multithreading on network servers using real hardware

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative multithreading on 3mbedded multiprocessor architectures enables energy-scalable design

Proceedings of the 42nd annual Design Automation Conference
Adaptive execution techniques for SMT multiprocessor architectures

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Multithreaded architectures and the sort benchmark

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
SCMP: a single-chip message-passing parallel computer

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Database hash-join algorithms on multithreaded computer architectures

Proceedings of the 3rd conference on Computing frontiers
Throttling-Based Resource Management in High Performance Multithreaded Architectures

IEEE Transactions on Computers
Speculative pre-execution assisted by compiler (SPEAR)

Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Thread-associative memory for multicore and multithreaded computing

Proceedings of the 2006 international symposium on Low power electronics and design
Design and evaluation of a hierarchical decoupled architecture

The Journal of Supercomputing
The application kernel approach—a novel approach for adding SMP support to uniprocessor operating systems

Software—Practice & Experience
Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors

Proceedings of the 20th annual international conference on Supercomputing
Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread

Journal of Parallel and Distributed Computing
An efficient implementation of a 3D wavelet transform based encoder on hyper-threading technology

Parallel Computing
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection

Proceedings of the International Symposium on Code Generation and Optimization
Parallel protein secondary structure prediction schemes using Pthread and OpenMP over hyper-threading technology

The Journal of Supercomputing
Superscalar out-of-order demystified in four instructions

WCAE '03 Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture
Resource area dilation to reduce power density in throughput servers

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Design of adaptive multiprocessor on chip systems

Proceedings of the 20th annual conference on Integrated circuits and systems design
Performance analysis and workload characterization of the 3DMark05 benchmark on modern parallel computer platforms

ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Speeding-up multiprocessors running DBMS workloads through coherence protocols

International Journal of High Performance Computing and Networking
Pipelined hash-join on multithreaded architectures

DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Vision platform for mobile intelligent robot based on 81.6 GOPS object recognition processor

Proceedings of the 45th annual Design Automation Conference
An adaptive resource partitioning algorithm for SMT processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A low-complexity microprocessor design with speculative pre-execution

Journal of Systems Architecture: the EUROMICRO Journal
Improving error tolerance for multithreaded register files

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Issue Mechanism for Embedded Simultaneous Multithreading Processor

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
The impact of speculative execution on SMT processors

International Journal of Parallel Programming
Implementing AUTOSAR scheduling and resource management on an embedded SMT processor

Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems
81.6 GOPS object recognition processor based on a memory-centric NoC

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fixed-priority scheduling on prioritized SMT processor

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Reconsidering algorithms for iterative solvers in the multicore era

International Journal of Computational Science and Engineering
A new multithreaded architecture supporting direct execution of Esterel

EURASIP Journal on Embedded Systems
Trace Cache Miss Rate

International Journal of Modelling and Simulation
Adaptive execution techniques of parallel programs for multiprocessors

Journal of Parallel and Distributed Computing
Communication assist for data driven multithreading

PCI'01 Proceedings of the 8th Panhellenic conference on Informatics
Evaluation of OpenMP for the cyclops multithreaded architecture

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Improving the performance of OpenMP by array privatization

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Mt-ADRES: multithreading on coarse-grained reconfigurable architecture

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Evaluating performance of new quad-core Intel®Xeon®5500 family processors for HPC

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Dynamic instruction scheduling in a trace-based multi-threaded architecture

International Journal of Parallel Programming
Chip multithreaded consistency model

Journal of Computer Science and Technology
Temporal isolation on multiprocessing architectures

Proceedings of the 48th Design Automation Conference
A high-throughput, high-accuracy system-level simulation framework for system on chips

VLSI Design
A fetch policy maximizing throughput and fairness for two-context SMT processors

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Static partitioning vs dynamic sharing of resources in simultaneous multithreading microarchitectures

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Enhancing DCache warn fetch policy for SMT processors

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Virtualization challenges: a view from server consolidation perspective

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Trends and challenges in operating systems---from parallel computing to cloud computing

Concurrency and Computation: Practice & Experience
Experiments with WRF on intel® many integrated core (intel MIC) architecture

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Analysis of Performance Bottlenecks in Multithreaded Multiprocessor Systems

Fundamenta Informaticae - Application of Concurrency to System Design
Low-latency adaptive mode transitions and hierarchical power management in asymmetric clustered cores

ACM Transactions on Architecture and Code Optimization (TACO)
Multicore-based vector coprocessor sharing for performance and energy gains

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors

Quantified Score

Hi-index	0.01

Visualization

Abstract

As the processor community prepares for a billion transistors on a chip, researchers continue to debate the most effective way to use them. One approach is to add more memory (either cache or primary memory) to the chip, but the performance gain from memory alone is limited. Another approach is to increase the level of systems integration, bringing support functions like graphics accelerators and I/O controllers on chip. Although integration lowers system costs and communication latency, the overall performance gain to applications is again marginal. We believe the only way to significantly improve performance is to enhance the processor's computational capabilities. In general, this means increasing parallelism-in all its available forms. At present only certain forms of parallelism are being exploited. Current superscalars, for example, can execute four or more instructions per cycle; in practice, however, they achieve only one or two, because current applications have low instruction-level parallelism. Placing multiple superscalar processors on a chip is also not an effective solution, because, in addition to the low instruction-level parallelism, performance suffers when there is little thread-level parallelism. A better solution is to design a processor that can exploit all types of parallelism well. Simultaneous multithreading is a processor design that meets this goal, because it consumes both thread-level and instruction-level parallelism. In SMT processors, thread-level parallelism can come from either multithreaded, parallel programs or individual, independent programs in a multiprogramming workload. Instruction-level parallelism comes from each single program or thread. Because it successfully (and simultaneously) exploits both types of parallelism, SMT processors use resources more efficiently, and both instruction throughput and speedups are greater.