Instruction combining for coalescing memory accesses using global code motion

Authors:
Motohiro Kawahito;Hideaki Komatsu;Toshio Nakatani
Affiliations:
IBM Tokyo Research Laboratory, Shimotsuruma, Yamato, Kanagawa, Japan;IBM Tokyo Research Laboratory, Shimotsuruma, Yamato, Kanagawa, Japan;IBM Tokyo Research Laboratory, Shimotsuruma, Yamato, Kanagawa, Japan
Venue:
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Year:
2004

Citing 21
Cited 0

Multiplication by Integer constants

Software—Practice & Experience
“Combining” as a compilation technique for VLIW architectures

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Division by invariant integers using multiplication

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Memory access coalescing: a technique for eliminating redundant memory accesses

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimal code motion: theory and practice

ACM Transactions on Programming Languages and Systems (TOPLAS)
Advanced compiler design and implementation

Advanced compiler design and implementation
Cost-optimal code motion

ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Partial redundancy elimination for access path expressions

Software—Practice & Experience - Special issue on aliasing in object-oriented systems
Effective null pointer check elimination utilizing hardware trap

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Preference-directed graph coloring

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Effective sign extension elimination

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
The Java Language Specification

The Java Language Specification
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Unified Analysis of Array and Object References in Strongly Typed Languages

SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Strength Reduction via SSAPRE

CC '98 Proceedings of the 7th International Conference on Compiler Construction
Path Profile Guided Partial Redundancy Elimination Using Speculation

ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Partial Redundancy Elimination Driven by a Cost-Benefit Analysis

ICCSSE '97 Proceedings of the 8th Israeli Conference on Computer-Based Systems and Software Engineering
Mostly concurrent garbage collection revisited

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Efficient spill code for SDRAM

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Partial redundancy elimination for access expressions by speculative code motion

Software—Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction combining is an optimization to replace a sequence of instructions with a more efficient instruction yielding the same result in a fewer machine cycles. When we use it for coalescing memory accesses, we can reduce the memory traffic by combining narrow memory references with contiguous addresses into a wider reference for taking advantage of a wide-bus architecture. Coalescing memory accesses can improve performance for two reasons: one by reducing the additional cycles required for moving data from caches to registers and the other by reducing the stall cycles caused by multiple outstanding memory access requests. Previous approaches for memory access coalescing focus only on array access instructions related to loop induction variables, and thus they miss many other opportunities. In this paper, we propose a new algorithm for instruction combining by applying global code motion to wider regions of the given program in search of more potential candidates. We implemented two optimizations for coalescing memory accesses, one combining two 32-bit integer loads and the other combining two single-precision floating-point loads, using our algorithm in the IBM Java™ JIT compiler for IA-64, and evaluated them by measuring the SPECjvm98 benchmark suite. In our experiment, we can improve the maximum performance by 5.5% with little additional compilation time overhead. Moreover, when we replace every declaration of double for an instance variable with float, we can improve the performance by 7.3% for the MolDyn benchmark in the JavaGrande benchmark suite. Our approach can be applied to a variety of architectures and to programming languages besides Java.