Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of multithreaded uniprocessors for commercial application environments
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Initial Observations of the Simultaneous Multithreading Pentium 4 Processor
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
POWER3: the next generation of PowerPC processors
IBM Journal of Research and Development
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
Scalability of the Nutch search engine
Proceedings of the 21st annual international conference on Supercomputing
Hard real-time performances in multiprocessor-embedded systems using ASMP-Linux
EURASIP Journal on Embedded Systems - Operating System Support for Embedded Real-Time Applications
Zero loads: canceling load requests by tracking zero values
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Multiplexed redundant execution: a technique for efficient fault tolerance in chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Analyzing the effects of hyperthreading on the performance of data management systems
International Journal of Parallel Programming
Hi-index | 0.00 |
Coarse-grained multithreading, the switching of threads to avoid idle processor time during long-latency events, has been available on IBM systems since 1998. Simultaneous multithreading (SMT), first available on the POWER5TM processor, moves beyond simple thread switching to the maintenance of two thread streams that are issued as continuously as possible to ensure the maximum use of processor resources. Because SMT has the potential of increasing processor officiency and correspondingly increasing the amount of work done for a given time span, the reader might suppose that SMT would exhibit a performance gain for all workloads. This is true for most workloads, but is not true in some exceptional cases. In SMT mode, the processor resources--register sets, caches, queues, translation buffers, and the system memory nest--must be shared by both threads, and conditions can occur that degrade or even obviate SMT performance improvement. The POWER4TM and POWER5 processors have very powerful performance monitor (PM) toolsets that can help the user to determine what is occurring in workloads that may not be providing expected SMT gains. In this paper, the results of measured differences among workloads having large, medium, small, and even negative SMT performance gains are presented along with an approach to investigating workloads to determine the source of SMT performance gain limits.