PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Using profile information to assist classic code optimizations
Software—Practice & Experience
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Interprocedural conditional branch elimination
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Predicting data cache misses in non-numeric applications through correlation profiling
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation
Advanced compiler design and implementation
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A Unified Approach to Path Problems
Journal of the ACM (JACM)
Fast Algorithms for Solving Path Problems
Journal of the ACM (JACM)
Understanding the backward slices of performance degrading instructions
Proceedings of the 27th annual international symposium on Computer architecture
Probabilistic data flow system with two-edge profiling
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Slice-processors: an implementation of operation-based prediction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Post-pass binary adaptation for software-based speculative precomputation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Difficult-path branch prediction using subordinate microthreads
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Design and evaluation of compiler algorithms for pre-execution
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dataflow Frequency Analysis Based on Whole Program Paths
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A quantitative framework for automated pre-execution thread selection
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A framework for modeling and optimization of prescient instruction prefetch
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Optimizing the cache performance of non-numeric applications
Optimizing the cache performance of non-numeric applications
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Interprocedural Probabilistic Pointer Analysis
IEEE Transactions on Parallel and Distributed Systems
Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection
Proceedings of the 32nd annual international symposium on Computer Architecture
Hi-index | 0.00 |
This paper investigates helper threads that improve performance by prefetching data on behalf of an application's main thread. The focus is data prefetch helper threads that lack branch instructions and which generate prefetches for one dynamic instance of a delinquent load instruction per spawned helper thread. This form of helper thread, some-times called a simple p-thread, has been studied previously by Roth et al. [29, 26] who proposed a framework for optimizing their impact. A key step in that framework is predicting the performance impact of a helper thread. In this paper we propose and evaluate a novel performance prediction technique that achieves comparable results yet requires less detailed information about dynamic program behavior. This technique extends a path expression based statistical modeling framework [2] by incorporating information about branch correlation (which we show is important) and by considering data flow information in a statistical manner. Significantly, the profile information we use is similar to that provided within current optimizing compilers. This paper also provides the first comprehensive assessment of the sources of modeling error relevant to predicting the performance impact of simple p-threads.