A new approach to the maximum-flow problem
Journal of the ACM (JACM)
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A safe approximate algorithm for interprocedural aliasing
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Hybrid slicing: an approach for refining static slices using dynamic information
SIGSOFT '95 Proceedings of the 3rd ACM SIGSOFT symposium on Foundations of software engineering
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
SUIF Explorer: an interactive and interprocedural parallelizer
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Understanding the backward slices of performance degrading instructions
Proceedings of the 27th annual international symposium on Computer architecture
Slice-processors: an implementation of operation-based prediction
ICS '01 Proceedings of the 15th international conference on Supercomputing
On the importance of points-to analysis and other memory disambiguation methods for C programs
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Introducing the IA-64 Architecture
IEEE Micro
Itanium Processor Microarchitecture
IEEE Micro
The Intel IA-64 Compiler Code Generator
IEEE Micro
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
Combinatorial Algorithms: Theory and Practice
Combinatorial Algorithms: Theory and Practice
Design and evaluation of compiler algorithms for pre-execution
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A quantitative framework for automated pre-execution thread selection
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A framework for modeling and optimization of prescient instruction prefetch
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
ACM Transactions on Computer Systems (TOCS)
Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Data forwarding through in-memory precomputation threads
Proceedings of the 18th annual international conference on Supercomputing
A study of source-level compiler algorithms for automatic construction of pre-execution code
ACM Transactions on Computer Systems (TOCS)
Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Compiler orchestrated prefetching via speculation and predication
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Helper Threads via Virtual Multithreading
IEEE Micro
Unpredication, Unscheduling, Unspeculation: Reverse Engineering Itanium Executables
IEEE Transactions on Software Engineering
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection
Proceedings of the 32nd annual international symposium on Computer Architecture
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative pre-execution assisted by compiler (SPEAR)
Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread
Microprocessors & Microsystems
Hybrid multi-core architecture for boosting single-threaded performance
ACM SIGARCH Computer Architecture News
Optimization of data prefetch helper threads with path-expression based statistical modeling
Proceedings of the 21st annual international conference on Supercomputing
Unified control flow and data dependence traces
ACM Transactions on Architecture and Code Optimization (TACO)
Data access history cache and associated data prefetching mechanisms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Server-based data push architecture for multi-processor environments
Journal of Computer Science and Technology
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A low-complexity microprocessor design with speculative pre-execution
Journal of Systems Architecture: the EUROMICRO Journal
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting Speculative TLP in Recursive Programs by Dynamic Thread Prediction
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Fast Track: A Software System for Speculative Program Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
COMPASS: a programmable data prefetcher using idle GPU shaders
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
ACM Transactions on Architecture and Code Optimization (TACO)
Software data spreading: leveraging distributed caches to improve single thread performance
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
Inter-core prefetching for multicore processors using migrating helper threads
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A low-complexity issue queue design with speculative pre-execution
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
A hybrid hardware/software generated prefetching thread mechanism on chip multiprocessors
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
smt-SPRINTS: software precomputation with intelligent streaming for resource-constrained SMTs
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Load-balanced pipeline parallelism
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Recently, a number of thread-based prefetching techniques have been proposed. These techniques aim at improving the latency of single-threaded applications by leveraging multithreading resources to perform memory prefetching via speculative prefetch threads. Software-based speculative precomputation (SSP) is one such technique, proposed for multithreaded Itanium models. SSP does not require expensive hardware support-instead it relies on the compiler to adapt binaries to perform prefetching on otherwise idle hardware thread contexts at run time. This paper presents a post-pass compilation tool for generating SSP-enhanced binaries. The tool is able to: (1) analyze a single-threaded application to generate prefetch threads; (2) identify and embed trigger points in the original binary; and (3) produce a new binary that has the prefetch threads attached. The execution of the new binary spawns the speculative prefetch threads, which are executed concurrently with the main thread. Our results indicate that for a set of pointer-intensive benchmarks, the prefetching performed by the speculative threads achieves an average of 87% speedup on an in-order processor and 5% speedup on an out-of-order processor.