Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor

Authors:
Jiwei Lu;Abhinav Das;Wei-Chung Hsu;Khoa Nguyen;Santosh G. Abraham
Affiliations:
University of Minnesota, Twin Cities;University of Minnesota, Twin Cities;University of Minnesota, Twin Cities;Sun Microsystems Inc.;Sun Microsystems Inc.
Venue:
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2005

Citing 28
Cited 26

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Understanding the backward slices of performance degrading instructions

Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Continuous Program Optimization: Design and Evaluation

IEEE Transactions on Computers
Post-pass binary adaptation for software-based speculative precomputation

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Using a user-level memory thread for correlation prefetching

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A general compiler framework for speculative multithreading

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
The Stanford Hydra CMP

IEEE Micro
The MAJC Architecture: A Synthesis of Parallelism and Scalability

IEEE Micro
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
An infrastructure for adaptive dynamic optimization

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs.Speculative Precomputation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Comparing Program Phase Detection Techniques

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Fighting the memory wall with assisted execution

Proceedings of the 1st conference on Computing frontiers
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture

Region Monitoring for Local Phase Detection in Dynamic Optimization Systems

Proceedings of the International Symposium on Code Generation and Optimization
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

Microprocessors & Microsystems
Ubiquitous memory introspection

Proceedings of the International Symposium on Code Generation and Optimization
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
SoftMon: programmable software monitoring with minimum overhead by helper-threading

Proceedings of the 2008 ACM symposium on Applied computing
A compiler-directed data prefetching scheme for chip multiprocessors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
PFetch: software prefetching exploiting temporal predictability of memory access streams

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Issue Mechanism for Embedded Simultaneous Multithreading Processor

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Dynamic performance tuning for speculative threads

Proceedings of the 36th annual international symposium on Computer architecture
Adapting application execution in CMPs using helper threads

Journal of Parallel and Distributed Computing
Memory management thread for heap allocation intensive sequential applications

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Data race: tame the beast

The Journal of Supercomputing
On improving heap memory layout by dynamic pool allocation

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Adaptive prefetching for shared cache based chip multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe
Inter-core prefetching for multicore processors using migrating helper threads

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Analysis and performance results of computing betweenness centrality on IBM Cyclops64

The Journal of Supercomputing
Loaf: a framework and infrastructure for creating online adaptive solutions

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Multicore performance optimization using partner cores

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
A study of the performance potential for dynamic instruction hints selection

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Issues and support for dynamic register allocation

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Dynamic register promotion of stack variables

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Implications of electronics technology trends to algorithm design

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Dynamically dispatching speculative threads to improve sequential execution

ACM Transactions on Architecture and Code Optimization (TACO)
Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Accelerating sequential programs on commodity multi-core processors

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data prefetching via helper threading has been extensively investigated on Simultaneous Multi- Threading (SMT) or Virtual Multi-Threading (VMT) architectures. Although reportedly large cache latency can be hidden by helper threads at runtime, most techniques rely on hardware support to reduce context switch overhead between the main thread and helper thread as well as rely on static profile feedback to construct the help thread code. This paper develops a new solution by exploiting helper threaded prefetching through dynamic optimization on the latest UltraSPARC Chip-Multiprocessing (CMP) processor. Our experiments show that by utilizing the otherwise idle processor core, a single user-level helper thread is sufficient to improve the runtime performance of the main thread without triggering multiple thread slices. Moreover, since the multiple cores are physically decoupled in the CMP, contention introduced by helper threading is minimal. This paper also discusses several key technical challenges of building a lightweight dynamic optimization/software scouting system on the UltraSPARC/Solaris platform.