Run-time disambiguation: coping with statically unpredictable dependencies
IEEE Transactions on Computers
Efficient and exact data dependence analysis
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Context-sensitive interprocedural points-to analysis in the presence of function pointers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
ICS '94 Proceedings of the 8th international conference on Supercomputing
Speculative disambiguation: a compilation technique for dynamic memory disambiguation
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Dynamic memory disambiguation for array references
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Dynamic memory disambiguation using the memory conflict buffer
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Efficient context-sensitive pointer analysis for C programs
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Fast and accurate flow-insensitive points-to analysis
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Simplification of array access patterns for compiler optimizations
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Constraint-based array dependence analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluation of predicated array data-flow analysis for automatic parallelization
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications
ACM Transactions on Architecture and Code Optimization (TACO)
Runtime dependency analysis for loop pipelining in high-level synthesis
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
Frequently, ambiguous memory references prevent the compiler from exploiting all the available ILP. Techniques to detect aliasing between access patterns of array elements are quite effective for many numeric applications, but although media codes usually process disjointed streams that exhibit regular access patterns, current commercial compilers remain unsuccessful in disambiguating them due mainly to complex pointer references. In this paper we propose a very cost effective disambiguation method that takes advantage of the specific behavior of typical media memory patterns. The compiler generates two versions of the same loop and a simple test block that decides at run-time whether or not the entire loop is disambiguated. No additional hardware is required and the increase in compilation time and code size is minimal. We have introduced this technique in Trimaran and evaluated it for a VLIW architecture with guarded execution. Experimental results confirm significant speedups (up to 1.32X for a 4-way architecture) for a relevant percentage of applications from the Mediabench benchmark suite. Furthermore, performance scales up to a 16-way machine (up to 1.73X versus the 16-way baseline.