Multiplication by Integer constants
Software—Practice & Experience
“Combining” as a compilation technique for VLIW architectures
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Division by invariant integers using multiplication
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Memory access coalescing: a technique for eliminating redundant memory accesses
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimal code motion: theory and practice
ACM Transactions on Programming Languages and Systems (TOPLAS)
Advanced compiler design and implementation
Advanced compiler design and implementation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Partial redundancy elimination for access path expressions
Software—Practice & Experience - Special issue on aliasing in object-oriented systems
Effective null pointer check elimination utilizing hardware trap
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Preference-directed graph coloring
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Effective sign extension elimination
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
The Java Language Specification
The Java Language Specification
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Unified Analysis of Array and Object References in Strongly Typed Languages
SAS '00 Proceedings of the 7th International Symposium on Static Analysis
CC '98 Proceedings of the 7th International Conference on Compiler Construction
Path Profile Guided Partial Redundancy Elimination Using Speculation
ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Partial Redundancy Elimination Driven by a Cost-Benefit Analysis
ICCSSE '97 Proceedings of the 8th Israeli Conference on Computer-Based Systems and Software Engineering
Mostly concurrent garbage collection revisited
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Efficient spill code for SDRAM
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Partial redundancy elimination for access expressions by speculative code motion
Software—Practice & Experience
Hi-index | 0.00 |
Instruction combining is an optimization to replace a sequence of instructions with a more efficient instruction yielding the same result in a fewer machine cycles. When we use it for coalescing memory accesses, we can reduce the memory traffic by combining narrow memory references with contiguous addresses into a wider reference for taking advantage of a wide-bus architecture. Coalescing memory accesses can improve performance for two reasons: one by reducing the additional cycles required for moving data from caches to registers and the other by reducing the stall cycles caused by multiple outstanding memory access requests. Previous approaches for memory access coalescing focus only on array access instructions related to loop induction variables, and thus they miss many other opportunities. In this paper, we propose a new algorithm for instruction combining by applying global code motion to wider regions of the given program in search of more potential candidates. We implemented two optimizations for coalescing memory accesses, one combining two 32-bit integer loads and the other combining two single-precision floating-point loads, using our algorithm in the IBM Java™ JIT compiler for IA-64, and evaluated them by measuring the SPECjvm98 benchmark suite. In our experiment, we can improve the maximum performance by 5.5% with little additional compilation time overhead. Moreover, when we replace every declaration of double for an instance variable with float, we can improve the performance by 7.3% for the MolDyn benchmark in the JavaGrande benchmark suite. Our approach can be applied to a variety of architectures and to programming languages besides Java.