Exploring the capacity of a modern SMT architecture to deliver high scientific application performance

Authors:
Evangelia Athanasaki;Nikos Anastopoulos;Kornilios Kourtis;Nectarios Koziris
Affiliations:
School of Electrical and Computer Engineering, National Technical University of Athens;School of Electrical and Computer Engineering, National Technical University of Athens;School of Electrical and Computer Engineering, National Technical University of Athens;School of Electrical and Computer Engineering, National Technical University of Athens
Venue:
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Year:
2006

Citing 7
Cited 0

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Initial Observations of the Simultaneous Multithreading Pentium 4 Processor

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processors

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that heterogeneity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization and communication mechanisms between its separate, but possibly dependent, threads. In this paper, we explore the performance limits by evaluating the tradeoffs between ILP and TLP for various kinds of instructions streams. We evaluate and contrast speculative precomputation (SPR) and thread-level parallelism (TLP) techniques for a series of scientific codes executed on an SMT processor. We also examine the effect of thread synchronization mechanisms on multithreaded parallel applications that are executed on a single SMT processor. In order to amplify this evaluation process, we also present results gathered from the performance monitoring hardware of the processor.