Strategies for achieving improved processor throughput
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Alpha architecture reference manual
Alpha architecture reference manual
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Register relocation: flexible contexts for multithreading
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Threads primer: a guide to multithreaded programming
Threads primer: a guide to multithreaded programming
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Slice-processors: an implementation of operation-based prediction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Post-pass binary adaptation for software-based speculative precomputation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Difficult-path branch prediction using subordinate microthreads
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Design and evaluation of compiler algorithms for pre-execution
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Intel IA-64 Compiler Code Generator
IEEE Micro
Evaluation of Multithreaded Processors and Thread-Switch Policies
ISHPC '97 Proceedings of the International Symposium on High Performance Computing
Comparative Evaluation of Latency Tolerance Techniques for Software Distributed Shared Memory
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Mini-Threads: Increasing TLP on Small-Scale SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Static Identification of Delinquent Loads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Hardware Support for Prescient Instruction Prefetch
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A multithreaded PowerPC processor for commercial servers
IBM Journal of Research and Development
Thread coloring: a scheduler proposal from user to hardware threads
ACM SIGOPS Operating Systems Review
Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection
Proceedings of the 32nd annual international symposium on Computer Architecture
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multiple Instruction Stream Processor
Proceedings of the 33rd annual international symposium on Computer Architecture
Improving the performance and power efficiency of shared helpers in CMPs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Hardware support for software controlled multithreading
ACM SIGARCH Computer Architecture News
Proceedings of the 21st annual international conference on Supercomputing
Optimization of data prefetch helper threads with path-expression based statistical modeling
Proceedings of the 21st annual international conference on Supercomputing
Exploiting Speculative TLP in Recursive Programs by Dynamic Thread Prediction
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Sprint: speculative prefetching of remote data
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Hi-index | 0.00 |
Helper threading is a technology to accelerate a program by exploiting a processor's multithreading capability to run ``assist'' threads. Previous experiments on hyper-threaded processors have demonstrated significant speedups by using helper threads to prefetch hard-to-predict delinquent data accesses. In order to apply this technique to processors that do not have built-in hardware support for multithreading, we introduce virtual multithreading (VMT), a novel form of switch-on-event user-level multithreading, capable of fly-weight multiplexing of event-driven thread executions on a single processor without additional operating system support. The compiler plays a key role in minimizing synchronization cost by judiciously partitioning register usage among the user-level threads. The VMT approach makes it possible to launch dynamic helper thread instances in response to long-latency cache miss events, and to run helper threads in the shadow of cache misses when the main thread would be otherwise stalled.The concept of VMT is prototyped on an Itanium ® 2 processor using features provided by the Processor Abstraction Layer (PAL) firmware mechanism already present in currently shipping processors. On a 4-way MP physical system equipped with VMT-enabled Itanium 2 processors, helper threading via the VMT mechanism can achieve significant performance gains for a diverse set of real-world workloads, ranging from single-threaded workstation benchmarks to heavily multithreaded large scale decision support systems (DSS) using the IBM DB2 Universal Database. We measure a wall-clock speedup of 5.8% to 38.5% for the workstation benchmarks, and 5.0% to 12.7% on various queries in the DSS workload.