An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
ICS '90 Proceedings of the 4th international conference on Supercomputing
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance Study of a Multithreaded Superscalar Microprocessor
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
An evaluation of staged run-time optimizations in DyC
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Tuning Compiler Optimizations for Simultaneous Multithreading
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
The benefits and costs of DyC's run-time optimizations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Analytical cache models with applications to cache partitioning
ICS '01 Proceedings of the 15th international conference on Supercomputing
An analysis of operating system behavior on a simultaneous multithreaded architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
Performance of a micro-threaded pipeline
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
A case for user-level interrupts
ACM SIGARCH Computer Architecture News
Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer
International Journal of Parallel Programming
Analysis of performance bottlenecks in multithreaded multiprocessor systems
Fundamenta Informaticae - Application of concurrency to system design
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Boosting SMT Performance by Speculation Control
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
COOL Approach to Petaflops Computing (invited paper)
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
A Queuing Model of a Multi-threaded Architecture: A Case Study
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Effects of Memory Performance on Parallel Job Scheduling
JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Improving the Performance of Heterogeneous DSMs via Multithreading
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Dissecting Cyclops: a detailed analysis of a multithreaded architecture
ACM SIGARCH Computer Architecture News
Improving server software support for simultaneous multithreaded processors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Simultaneous Multithreading-Based Routers
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
The need for adaptive dynamic thread scheduling
High performance scientific and engineering computing
Fighting the memory wall with assisted execution
Proceedings of the 1st conference on Computing frontiers
A retrospective on: "an evaluation of staged run-time optimizations in DyC"
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Architectural Support for Enhanced SMT Job Scheduling
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Area and System Clock Effects on SMT/CMP Throughput
IEEE Transactions on Computers
Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Journal of Parallel and Distributed Computing
Evaluating the impact of simultaneous multithreading on network servers using real hardware
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative multithreading on 3mbedded multiprocessor architectures enables energy-scalable design
Proceedings of the 42nd annual Design Automation Conference
Adaptive execution techniques for SMT multiprocessor architectures
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Multithreaded architectures and the sort benchmark
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
SCMP: a single-chip message-passing parallel computer
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Database hash-join algorithms on multithreaded computer architectures
Proceedings of the 3rd conference on Computing frontiers
Throttling-Based Resource Management in High Performance Multithreaded Architectures
IEEE Transactions on Computers
Speculative pre-execution assisted by compiler (SPEAR)
Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Thread-associative memory for multicore and multithreaded computing
Proceedings of the 2006 international symposium on Low power electronics and design
Design and evaluation of a hierarchical decoupled architecture
The Journal of Supercomputing
Software—Practice & Experience
Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors
Proceedings of the 20th annual international conference on Supercomputing
Journal of Parallel and Distributed Computing
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
Proceedings of the International Symposium on Code Generation and Optimization
The Journal of Supercomputing
Superscalar out-of-order demystified in four instructions
WCAE '03 Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture
Resource area dilation to reduce power density in throughput servers
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Design of adaptive multiprocessor on chip systems
Proceedings of the 20th annual conference on Integrated circuits and systems design
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
Pipelined hash-join on multithreaded architectures
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Vision platform for mobile intelligent robot based on 81.6 GOPS object recognition processor
Proceedings of the 45th annual Design Automation Conference
An adaptive resource partitioning algorithm for SMT processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A low-complexity microprocessor design with speculative pre-execution
Journal of Systems Architecture: the EUROMICRO Journal
Improving error tolerance for multithreaded register files
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Issue Mechanism for Embedded Simultaneous Multithreading Processor
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
The impact of speculative execution on SMT processors
International Journal of Parallel Programming
Implementing AUTOSAR scheduling and resource management on an embedded SMT processor
Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems
81.6 GOPS object recognition processor based on a memory-centric NoC
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fixed-priority scheduling on prioritized SMT processor
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Reconsidering algorithms for iterative solvers in the multicore era
International Journal of Computational Science and Engineering
A new multithreaded architecture supporting direct execution of Esterel
EURASIP Journal on Embedded Systems
International Journal of Modelling and Simulation
Adaptive execution techniques of parallel programs for multiprocessors
Journal of Parallel and Distributed Computing
Communication assist for data driven multithreading
PCI'01 Proceedings of the 8th Panhellenic conference on Informatics
Evaluation of OpenMP for the cyclops multithreaded architecture
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Improving the performance of OpenMP by array privatization
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Mt-ADRES: multithreading on coarse-grained reconfigurable architecture
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Evaluating performance of new quad-core Intel®Xeon®5500 family processors for HPC
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Dynamic instruction scheduling in a trace-based multi-threaded architecture
International Journal of Parallel Programming
Chip multithreaded consistency model
Journal of Computer Science and Technology
Temporal isolation on multiprocessing architectures
Proceedings of the 48th Design Automation Conference
A fetch policy maximizing throughput and fairness for two-context SMT processors
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Enhancing DCache warn fetch policy for SMT processors
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Virtualization challenges: a view from server consolidation perspective
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Trends and challenges in operating systems---from parallel computing to cloud computing
Concurrency and Computation: Practice & Experience
Experiments with WRF on intel® many integrated core (intel MIC) architecture
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Analysis of Performance Bottlenecks in Multithreaded Multiprocessor Systems
Fundamenta Informaticae - Application of Concurrency to System Design
ACM Transactions on Architecture and Code Optimization (TACO)
Multicore-based vector coprocessor sharing for performance and energy gains
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Hi-index | 0.01 |
As the processor community prepares for a billion transistors on a chip, researchers continue to debate the most effective way to use them. One approach is to add more memory (either cache or primary memory) to the chip, but the performance gain from memory alone is limited. Another approach is to increase the level of systems integration, bringing support functions like graphics accelerators and I/O controllers on chip. Although integration lowers system costs and communication latency, the overall performance gain to applications is again marginal. We believe the only way to significantly improve performance is to enhance the processor's computational capabilities. In general, this means increasing parallelism-in all its available forms. At present only certain forms of parallelism are being exploited. Current superscalars, for example, can execute four or more instructions per cycle; in practice, however, they achieve only one or two, because current applications have low instruction-level parallelism. Placing multiple superscalar processors on a chip is also not an effective solution, because, in addition to the low instruction-level parallelism, performance suffers when there is little thread-level parallelism. A better solution is to design a processor that can exploit all types of parallelism well. Simultaneous multithreading is a processor design that meets this goal, because it consumes both thread-level and instruction-level parallelism. In SMT processors, thread-level parallelism can come from either multithreaded, parallel programs or individual, independent programs in a multiprogramming workload. Instruction-level parallelism comes from each single program or thread. Because it successfully (and simultaneously) exploits both types of parallelism, SMT processors use resources more efficiently, and both instruction throughput and speedups are greater.