Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
New CPU benchmark suites from SPEC
COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Microarchitecture support for dynamic scheduling of acyclic task graphs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Proceedings of the 28th annual international symposium on Microarchitecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The effects of STEF in finely parallel multithreaded processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and performance evaluation of a multithreaded architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Performance Study of a Multithreaded Superscalar Microprocessor
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Initial results on the performance and cost of vector microprocessors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Multipath execution: opportunities and limits
ICS '98 Proceedings of the 12th international conference on Supercomputing
Speculative execution model with duplication
ICS '98 Proceedings of the 12th international conference on Supercomputing
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
On embedding a microarchitectural design language within Haskell
Proceedings of the fourth ACM SIGPLAN international conference on Functional programming
Instruction fetch mechanisms for multipath execution processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Hardware identification of cache conflict misses
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Delaying physical register allocation through virtual-physical registers
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The use of multithreading for exception handling
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning
International Journal of Parallel Programming
Design Alternatives of Multithreaded Architecture
International Journal of Parallel Programming
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Tuning Compiler Optimizations for Simultaneous Multithreading
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Symbiotic jobscheduling for a simultaneous mutlithreading processor
ACM SIGPLAN Notices
Slipstream processors: improving both performance and fault tolerance
ACM SIGPLAN Notices
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Relational profiling: enabling thread-level parallelism in virtual machines
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The benefits and costs of DyC's run-time optimizations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Slice-processors: an implementation of operation-based prediction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
An analysis of operating system behavior on a simultaneous multithreaded architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Runtime identification of cache conflict misses: The adaptive miss buffer
ACM Transactions on Computer Systems (TOCS)
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Detailed design and evaluation of redundant multithreading alternatives
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Tarantula: a vector extension to the alpha architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design tradeoffs for the Alpha EV8 conditional branch predictor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Handling long-latency loads in a simultaneous multithreading processor
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Design and evaluation of compiler algorithms for pre-execution
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Enhancing software reliability with speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Speculative Multithreaded Processors
Computer
Computer
Dependable Computing and Online Testing in Adaptive and Configurable Systems
IEEE Design & Test
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Amir Roth: Speculative Multithreaded Processors
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Boosting SMT Performance by Speculation Control
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
The Case for Speculative Multithreading on SMT Processors
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Formal Verification of Explicitly Parallel Microprocessors
CHARME '99 Proceedings of the 10th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Microprocessors - 10 Years Back, 10 Years Ahead
Informatics - 10 Years Back. 10 Years Ahead.
Automatic pool allocation for disjoint data structures
Proceedings of the 2002 workshop on Memory system performance
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Microarchitectural denial of service: insuring microarchitectural fairness
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiling for instruction cache performance on a multithreaded architecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A framework for modeling and optimization of prescient instruction prefetch
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Improving server software support for simultaneous multithreaded processors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Mini-Threads: Increasing TLP on Small-Scale SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Front-End Policies for Improved Issue Efficiency in SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Data Dependence Tracking and its Application to Branch Prediction
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Power-Sensitive Multithreaded Architecture
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Simultaneous Multithreading-Based Routers
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Implicitly-multithreaded processors
Proceedings of the 30th annual international symposium on Computer architecture
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
An evaluation of speculative instruction execution on simultaneous multithreaded processors
ACM Transactions on Computer Systems (TOCS)
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
An Experimental Study of Polylogarithmic, Fully Dynamic, Connectivity Algorithms
Journal of Experimental Algorithmics (JEA)
The need for adaptive dynamic thread scheduling
High performance scientific and engineering computing
Predictable performance in SMT processors
Proceedings of the 1st conference on Computing frontiers
ACM Transactions on Computer Systems (TOCS)
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
Wire Delay is Not a Problem for SMT (In the Near Future)
Proceedings of the 31st annual international symposium on Computer architecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Extended Split-Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs
Proceedings of the 31st annual international symposium on Computer architecture
A study of source-level compiler algorithms for automatic construction of pre-execution code
ACM Transactions on Computer Systems (TOCS)
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
Safely exploiting multithreaded processors to tolerate memory latency in real-time systems
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
A scalable, clustered SMT processor for digital signal processing
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Architectural Support for Enhanced SMT Job Scheduling
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Area and System Clock Effects on SMT/CMP Throughput
IEEE Transactions on Computers
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scalable cache memory design for large-scale SMT architectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Architecture optimization for multimedia application exploiting data and thread-level parallelism
Journal of Systems Architecture: the EUROMICRO Journal
Evaluating the impact of simultaneous multithreading on network servers using real hardware
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Opportunistic Transient-Fault Detection
Proceedings of the 32nd annual international symposium on Computer Architecture
Understanding the energy efficiency of SMT and CMP with multiclustering
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Architectural support for real-time task scheduling in SMT processors
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Tornado warning: the perils of selective replay in multithreaded processors
Proceedings of the 19th annual international conference on Supercomputing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multithreaded architectures and the sort benchmark
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
Proceedings of the International Symposium on Code Generation and Optimization
Database hash-join algorithms on multithreaded computer architectures
Proceedings of the 3rd conference on Computing frontiers
Intelligent memory manager: reducing cache pollution due to memory management functions
Journal of Systems Architecture: the EUROMICRO Journal
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
Proceedings of the 33rd annual international symposium on Computer Architecture
A Low-Power Multithreaded Processor for Software Defined Radio
Journal of VLSI Signal Processing Systems
Spin Detection Hardware for Improved Management of Multithreaded Systems
IEEE Transactions on Parallel and Distributed Systems
Automatic logging of operating system effects to guide application-level architecture simulation
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
Adaptive reorder buffers for SMT processors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Predictable Performance in SMT Processors: Synergy between the OS and SMTs
IEEE Transactions on Computers
Throttling-Based Resource Management in High Performance Multithreaded Architectures
IEEE Transactions on Computers
A case study of multi-threading in the embedded space
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Journal of Parallel and Distributed Computing
Register port complexity reduction in wide-issue processors with selective instruction execution
Microprocessors & Microsystems
Using fine grain multithreading for energy efficient computing
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compacting register file via 2-level renaming and bit-partitioning
Microprocessors & Microsystems
Online diagnosis of hard faults in microprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Enhancements for hyper-threading technology in the operating system: seeking the optimal scheduling
WIESS'02 Proceedings of the 2nd conference on Industrial Experiences with Systems Software - Volume 2
An L2-miss-driven early register deallocation for SMT processors
Proceedings of the 21st annual international conference on Supercomputing
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
Resource area dilation to reduce power density in throughput servers
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies
IEEE Transactions on Computers
Do commodity SMT processors need more OS research?
ACM SIGOPS Operating Systems Review
Exploring the performance limits of simultaneous multithreading for memory intensive applications
The Journal of Supercomputing
International Journal of High Performance Computing and Networking
A latency-conscious SMT branch prediction architecture
International Journal of High Performance Computing and Networking
Optimising long-latency-load-aware fetch policies for SMT processors
International Journal of High Performance Computing and Networking
Pipelined hash-join on multithreaded architectures
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
A SMT-ARM simulator and performance evaluation
SEPADS'06 Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Addressing thermal nonuniformity in SMT workloads
ACM Transactions on Architecture and Code Optimization (TACO)
A dynamically reconfigurable cache for multithreaded processors
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
The shared-thread multiprocessor
Proceedings of the 22nd annual international conference on Supercomputing
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
An adaptive resource partitioning algorithm for SMT processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Meeting points: using thread criticality to adapt multicore hardware to parallel regions
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
DLL-conscious instruction fetch optimization for SMT processors
Journal of Systems Architecture: the EUROMICRO Journal
Reducing register pressure in SMT processors through L2-miss-driven early register release
ACM Transactions on Architecture and Code Optimization (TACO)
Speculative return address stack management revisited
ACM Transactions on Architecture and Code Optimization (TACO)
Hill-climbing SMT processor resource distribution
ACM Transactions on Computer Systems (TOCS)
A multithreading embedded architecture
DNCOCO'08 Proceedings of the 7th conference on Data networks, communications, computers
An approach on distributed and shared dynamic cache partition
DNCOCO'08 Proceedings of the 7th conference on Data networks, communications, computers
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
IPC Control for Multiple Real-Time Threads on an In-Order SMT Processor
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Per-thread cycle accounting in SMT processors
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
ACM Transactions on Architecture and Code Optimization (TACO)
A swarm-inspired resource distribution for SMT processors
Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems
Issue Mechanism for Embedded Simultaneous Multithreading Processor
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
A Garbage Collection Technique for Embedded Multithreaded Multicore Processors
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
The impact of speculative execution on SMT processors
International Journal of Parallel Programming
Proceedings of the 36th annual international symposium on Computer architecture
Improving SMT performance: an application of genetic algorithms to configure resizable caches
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
ACM SIGARCH Computer Architecture News
Energy-efficient register caching with compiler assistance
ACM Transactions on Architecture and Code Optimization (TACO)
Evaluation of Different Multithreaded and Multicore Processor Configurations for SoPC
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
The Impact of Resource Sharing Control on the Design of Multicore Processors
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Paired ROBs: A Cost-Effective Reorder Buffer Sharing Strategy for SMT Processors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Architecture Design for Soft Errors
Architecture Design for Soft Errors
MPTLsim: a simulator for X86 multicore processors
Proceedings of the 46th Annual Design Automation Conference
Fixed-priority scheduling on prioritized SMT processor
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Selective replication: A lightweight technique for soft errors
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic job symbiosis modeling for SMT processor scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dynamic capacity-speed tradeoffs in SMT processor caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Compiler techniques for reducing data cache miss rate on a multithreaded architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A predictable simultaneous multithreading scheme for hard real-time
ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Soft real-time scheduling on SMT processors with explicit resource allocation
ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
Modeling critical sections in Amdahl's law and its implications for multicore design
Proceedings of the 37th annual international symposium on Computer architecture
ACM Transactions on Architecture and Code Optimization (TACO)
An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing
Journal of Parallel and Distributed Computing
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Lithe: enabling efficient composition of parallel libraries
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Improving SMT performance scheduling processes
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compatible phase co-scheduling on a CMP of multi-threaded processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Data layout for cache performance on a multithreaded architecture
Transactions on high-performance embedded architectures and compilers III
Predictive coordination of multiple on-chip resources for chip multiprocessors
Proceedings of the international conference on Supercomputing
A study on factors influencing power consumption in multithreaded and multicore CPUs
WSEAS Transactions on Computers
Managing SMT resource usage through speculative instruction window weighting
ACM Transactions on Architecture and Code Optimization (TACO)
ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Capacity metric for chip heterogeneous multiprocessors
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A fault-tolerant, dynamically scheduled pipeline structure for chip multiprocessors
SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
A phase adaptive cache hierarchy for SMT processors
Microprocessors & Microsystems
Trade-offs in transient fault recovery schemes for redundant multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Supporting speculative multithreading on simultaneous multithreaded processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Enhancing ICOUNT2.8 fetch policy with better fairness for SMT processors
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Next generation embedded processor architecture for personal information devices
EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
A fetch policy maximizing throughput and fairness for two-context SMT processors
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Enhancing DCache warn fetch policy for SMT processors
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
2L-MuRR: a compact register renaming scheme for SMT processors
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
An in-order SMT architecture with static resource partitioning for consumer applications
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Parallel job scheduling — a status report
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Low power microprocessor design for embedded systems
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
Improving accuracy of perceptron predictor through correlating data values in SMT processors
ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part III
ACM Transactions on Computer Systems (TOCS)
Probabilistic modeling for job symbiosis scheduling on SMT processors
ACM Transactions on Architecture and Code Optimization (TACO)
Mixed speculative multithreaded execution models
ACM Transactions on Architecture and Code Optimization (TACO)
APC: a performance metric of memory systems
ACM SIGMETRICS Performance Evaluation Review
A bypass mechanism to enhance branch predictor for SMT processors
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processors
Computers and Electrical Engineering
FROCM: a fair and low-overhead method in SMT processor
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
ACM Transactions on Architecture and Code Optimization (TACO)
Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling
Proceedings of the ACM International Conference on Computing Frontiers
Enabling fair pricing on HPC systems with node sharing
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Computers and Electrical Engineering
ACM Transactions on Architecture and Code Optimization (TACO)
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GPUfs: Integrating a file system with GPUs
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.02 |
Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.