Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Simultaneous subordinate microthreading (SSMT)
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving virtual function call target prediction via dependence-based pre-computation
ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Slice-processors: an implementation of operation-based prediction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Post-pass binary adaptation for software-based speculative precomputation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Using a user-level memory thread for correlation prefetching
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Difficult-path branch prediction using subordinate microthreads
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Design and evaluation of compiler algorithms for pre-execution
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Pre-execution via speculative data-driven multithreading
Pre-execution via speculative data-driven multithreading
A framework for modeling and optimization of prescient instruction prefetch
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Data forwarding through in-memory precomputation threads
Proceedings of the 18th annual international conference on Supercomputing
A study of source-level compiler algorithms for automatic construction of pre-execution code
ACM Transactions on Computer Systems (TOCS)
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection
Proceedings of the 32nd annual international symposium on Computer Architecture
Speculative pre-execution assisted by compiler (SPEAR)
Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads
ACM Transactions on Architecture and Code Optimization (TACO)
Optimization of data prefetch helper threads with path-expression based statistical modeling
Proceedings of the 21st annual international conference on Supercomputing
A low-complexity microprocessor design with speculative pre-execution
Journal of Systems Architecture: the EUROMICRO Journal
smt-SPRINTS: software precomputation with intelligent streaming for resource-constrained SMTs
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
Pre-execution attacks cache misses for which address prediction driven prefetching fails. In pre-execution, copies of cache miss computations are isolated from the main program and launched as separate threads called p-threads whenever the processor anticipates an upcoming miss. P-thread selection is the task of deciding what computations should execute as p-threads and when they should be launched such that total execution time is minimized. It is central to the success of pre-execution.We introduce a framework for automated static p-thread selection, a static p-thread being one whose dynamic instances are repeatedly launched during course of program execution. Our approach is to formalize the problem quantitatively and then apply standard techniques to solve it analytically. The framework has two novel components. The slice tree is a data structure that compactly represents a set of static p-threads and the relationships among them. Aggregate advantage is a formula that uses raw program statistics and computation structure to assign each candidate static p-thread a numeric score based on estimated latency tolerance and overhead aggregated over its expected dynamic executions.We use the framework to select p-threads that cover L2 misses and study its effectiveness under different conditions via detailed simulation. We measure the effect of constraining p-thread length, locally optimizing p-threads, using different program samples as a statistical basis selection, and varying several machine parameters. Our framework responds to these changes in an intuitive way. We also validate that aggregate advantage correctly models actual pre-execution.