Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Speculative disambiguation: a compilation technique for dynamic memory disambiguation
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Dynamic memory disambiguation using the memory conflict buffer
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Alias analysis of executable code
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Wavefront scheduling: path based data representation and scheduling of subgraphs
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies
IEEE Transactions on Computers
Speculative Alias Analysis for Executable Code
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Speculative register promotion using Advanced Load Address Table (ALAT)
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
Performance potentials of compiler-directed data speculation
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
Hardware atomicity for reliable software speculation
Proceedings of the 34th annual international symposium on Computer architecture
Dynamic parallelization of single-threaded binary programs using speculative slicing
Proceedings of the 23rd international conference on Supercomputing
Improved memory-access analysis for x86 executables
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Modeling and Performance Evaluation of TSO-Preserving Binary Optimization
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Allocating rotating registers by scheduling
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Traditional alias analysis is expensive and ineffective for dynamic optimizations. In practice, dynamic optimization systems perform memory optimizations speculatively, and rely on hardware, such as alias registers, to detect memory aliases at runtime. Existing hardware alias detection schemes either cannot scale up to a large number of alias registers or may introduce false positives. Order-based alias detection overcomes the limitations. However, it brings considerable challenges as how software can efficiently manage the alias register queue and impose restrictions on optimizations. In this paper, we present SMARQ, a Software-Managed Alias Register Queue, which manages the alias register queue efficiently and supports more aggressive speculative optimizations. We conducted experiments with a dynamic optimization system on a VLIW processor that has 64 alias registers. The experiments on a suite of SPECFP2000 benchmarks show that SMARQ improves the overall performance by 39% as compared to the case without hardware alias detection. By scaling up to a large number (from 16 to 64) of alias registers, SMARQ improves performance by 10%. Compared to a technique with false positives (similar to Itanium), SMARQ improves performance by 13%. To reduce the chance of alias register overflow, the novel alias register allocation algorithm in SMARQ reduces the alias register working set by 74% as compared to a straightforward alias register allocation based on program order.