Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Link-time optimization of address calculation on a 64-bit architecture
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Hot cold optimization of large Windows/NT applications
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation
Advanced compiler design and implementation
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Optimizing alpha executables on Windows NT with spike
Digital Technical Journal
Profile-directed restructuring of operating system code
IBM Systems Journal
Instrumentation and optimization of Win32/intel executables using Etch
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Link-time optimization of ARM binaries
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Reducing program image size by extracting frozen code and data
Proceedings of the 4th ACM international conference on Embedded software
2D data locality: definition, abstraction, and application
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Whole-program optimization of global variable layout
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Power-efficient prefetching for embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Link-time compaction and optimization of ARM executables
ACM Transactions on Embedded Computing Systems (TECS)
Aggressive function inlining: preventing loop blockings in the instruction cache
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
The advantages of post-link code coverage
HVC'07 Proceedings of the 3rd international Haifa verification conference on Hardware and software: verification and testing
Hi-index | 0.00 |
Memory access has proven to be one of the bottlenecks in modern architectures. Improving memory locality and eliminating the amount of memory access can help release this bottleneck. We present a method for link-time profile-based optimization by reordering the global data of the program and modifying its code accordingly. The proposed optimization reorders the entire global data of the program, according to a representative execution rate of each instruction (or basic block) in the code. The data reordering is done in a way that enables the replacement of frequently-executed Load instructions, which reference the global data, with fast Add Immediate instructions. In addition, it tries to improve the global data locality and to reduce the total size of the global data area. The optimization was implemented into FDPR (Feedback Directed Program Restructuring), a post-link optimizer, which is part of the IBM AIX operating system for the IBM pSeries servers. Our results on SPECint2000 show a significant improvement of up to 11% (average 3%) in execution time, along with up to 97.9% (average 83%) reduction in memory references to the global variables via the global data access mechanism of the program.