Optimization opportunities created by global data reordering

Authors:
Gadi Haber;Moshe Klausner;Vadim Eisenberg;Bilha Mendelson;Maxim Gurevich
Affiliations:
IBM Research Lab in Haifa, Israel;IBM Research Lab in Haifa, Israel;IBM Research Lab in Haifa, Israel;IBM Research Lab in Haifa, Israel;IBM Research Lab in Haifa, Israel
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2003

Citing 9
Cited 9

Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Link-time optimization of address calculation on a 64-bit architecture

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Hot cold optimization of large Windows/NT applications

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation

Advanced compiler design and implementation
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Optimizing alpha executables on Windows NT with spike

Digital Technical Journal
Profile-directed restructuring of operating system code

IBM Systems Journal
Instrumentation and optimization of Win32/intel executables using Etch

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997

Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Link-time optimization of ARM binaries

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Reducing program image size by extracting frozen code and data

Proceedings of the 4th ACM international conference on Embedded software
2D data locality: definition, abstraction, and application

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Whole-program optimization of global variable layout

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Power-efficient prefetching for embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
Link-time compaction and optimization of ARM executables

ACM Transactions on Embedded Computing Systems (TECS)
Aggressive function inlining: preventing loop blockings in the instruction cache

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
The advantages of post-link code coverage

HVC'07 Proceedings of the 3rd international Haifa verification conference on Hardware and software: verification and testing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory access has proven to be one of the bottlenecks in modern architectures. Improving memory locality and eliminating the amount of memory access can help release this bottleneck. We present a method for link-time profile-based optimization by reordering the global data of the program and modifying its code accordingly. The proposed optimization reorders the entire global data of the program, according to a representative execution rate of each instruction (or basic block) in the code. The data reordering is done in a way that enables the replacement of frequently-executed Load instructions, which reference the global data, with fast Add Immediate instructions. In addition, it tries to improve the global data locality and to reduce the total size of the global data area. The optimization was implemented into FDPR (Feedback Directed Program Restructuring), a post-link optimizer, which is part of the IBM AIX operating system for the IBM pSeries servers. Our results on SPECint2000 show a significant improvement of up to 11% (average 3%) in execution time, along with up to 97.9% (average 83%) reduction in memory references to the global variables via the global data access mechanism of the program.