The effect of instruction set complexity on program size and memory performance
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
DOC: a practical approach to source-level debugging of globally optimized code
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems
IEEE Transactions on Software Engineering
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Determining average program execution times and their variance
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Improving locality by critical working sets
Communications of the ACM
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
CCG: a prototype coagulating code generator
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Predicting program behavior using real or estimated profiles
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Procedure merging with instruction caches
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Experience with a software-defined machine architecture
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fast instruction cache performance evaluation using compile-time analysis
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Cache replacement with dynamic exclusion
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Optimally profiling and tracing programs
POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Ordering functions for improving memory reference locality in a shared memory multiprocessor system
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Efficient software-based fault isolation
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Improving semi-static branch prediction by code replication
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compile time instruction cache optimizations
ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Static branch frequency and program profile analysis
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Trace-directed program restructuring for AIX executables
IBM Journal of Research and Development
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Accurate static branch prediction by value range propagation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Simple and effective link-time optimization of Modula-3 programs
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Next cache line and set prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance issues in correlated branch prediction schemes
Proceedings of the 28th annual international symposium on Microarchitecture
The predictability of branches in libraries
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Evidence-based static branch prediction using machine learning
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hot cold optimization of large Windows/NT applications
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Generating efficient protocol code from an abstract specification
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Analysis of techniques to improve protocol processing latency
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Cache behavior of network protocols
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Interprocedural dataflow analysis in an executable optimizer
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Resource-bounded partial evaluation
PEPM '97 Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Generating efficient protocol code from an abstract specification
IEEE/ACM Transactions on Networking (TON)
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
DAC '98 Proceedings of the 35th annual Design Automation Conference
Scalable cross-module optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Execution characteristics of desktop applications on Windows NT
Proceedings of the 25th annual international symposium on Computer architecture
Compact and efficient presentation conversion code
IEEE/ACM Transactions on Networking (TON)
Better global scheduling using path profiles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Overlapping execution with transfer using non-strict execution for mobile programs
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Optimizing the Instruction Cache Performance of the Operating System
IEEE Transactions on Computers
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies
IEEE Transactions on Computers
ICS '99 Proceedings of the 13th international conference on Supercomputing
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback
ACM Transactions on Computer Systems (TOCS)
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Reducing transfer delay using Java class file splitting and prefetching
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Static correlated branch prediction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient and Precise Cache Behavior Prediction for Real-TimeSystems
Real-Time Systems
A hardware mechanism for dynamic extraction and relayout of program hot spots
Proceedings of the 27th annual international symposium on Computer architecture
Compiler techniques for code compaction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS)
Offline program re-mapping to improve branch prediction efficiency in embedded systems
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Code layout optimizations for transaction processing workloads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Partial method compilation using dynamic profile information
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A fast on-chip profiler memory
Proceedings of the 39th annual Design Automation Conference
Handling irreducible loops: optimized node splitting versus DJ-graphs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Online feedback-directed optimization of Java
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Software Trace Cache for Commercial Applications
International Journal of Parallel Programming
The Effect of Code Expanding Optimizations on Instruction Cache Design
IEEE Transactions on Computers
The set-associative cache performance of search trees
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Modeling the impact of run-time uncertainty on optimal computation scheduling using feedback
ICPP '97 Proceedings of the international Conference on Parallel Processing
Code Positioning for VLIW Architectures
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Speculative Alias Analysis for Executable Code
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
On the Performance of Fetch Engines Running DSS Workloads
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Load Redundancy Elimination on Executable Code
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Energy frugal tags in reprogrammable I-caches for application-specific embedded processors
Proceedings of the tenth international symposium on Hardware/software codesign
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiling for instruction cache performance on a multithreaded architecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimization opportunities created by global data reordering
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Continuous program optimization: A case study
ACM Transactions on Programming Languages and Systems (TOPLAS)
On the side-effects of code abstraction
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
A region-based compilation technique for a Java just-in-time compiler
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Optimized code restructuring of OS/2 executables
CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
The Use of Feedback in Scheduling Parallel Computations
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Call graph prefetching for database applications
ACM Transactions on Computer Systems (TOCS)
Frequent loop detection using efficient non-intrusive on-chip hardware
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Improving spatial locality of programs via data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Cache-Aware Scratchpad Allocation Algorithm
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Predicting program behavior using real or estimated profiles
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Buffering databse operations for enhanced instruction cache performance
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Profile-directed restructuring of operating system code
IBM Systems Journal
Procedure placement using temporal-ordering information: dealing with code size expansion
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
IEEE Transactions on Computers
Using trace analysis for improving performance in COTS systems
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Performance of Runtime Optimization on BLAST
Proceedings of the international symposium on Code generation and optimization
A first look at the interplay of code reordering and configurable caches
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Dynamic run-time architecture techniques for enabling continuous optimization
Proceedings of the 2nd conference on Computing frontiers
Code placement for improving dynamic branch prediction accuracy
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Evolution of a java just-in-time compiler for IA-32 platforms
IBM Journal of Research and Development
Frequent Loop Detection Using Efficient Nonintrusive On-Chip Hardware
IEEE Transactions on Computers
Link-time binary rewriting techniques for program compaction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Continuous Path and Edge Profiling
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A region-based compilation technique for dynamic compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Improving WCET by applying a WC code-positioning optimization
ACM Transactions on Architecture and Code Optimization (TACO)
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic inference of polymorphic lock types
Science of Computer Programming - Special issue: Concurrency and synchronization in Java programs
Fast and efficient partial code reordering: taking advantage of dynamic recompilatior
Proceedings of the 5th international symposium on Memory management
Whole-program optimization of global variable layout
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Adapting compilation techniques to enhance the packing of instructions into registers
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Scratchpad memory management for portable systems with a memory management unit
EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
TRICK: tracking and reusing compiler's knowledge
ACM SIGPLAN Notices
Procedure placement using temporal-ordering information: Dealing with code size expansion
Journal of Embedded Computing - Cache exploitation in embedded systems
Ablego: a function outlining and partial inlining framework: Research Articles
Software—Practice & Experience
Code reordering on limited branch offset
ACM Transactions on Architecture and Code Optimization (TACO)
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Protecting against unexpected system calls
SSYM'05 Proceedings of the 14th conference on USENIX Security Symposium - Volume 14
External memory page remapping for embedded multimedia systems
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
DRIM: a low power dynamically reconfigurable instruction memory hierarchy for embedded systems
Proceedings of the conference on Design, automation and test in Europe
Improving UNIX kernel performance using profile based optimization
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Optimizing the performance of dynamically-linked programs
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Instrumentation and optimization of Win32/intel executables using Etch
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Spike: an optimizer for alpha/NT executables
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Improving instruction locality with just-in-time code layout
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Reducing startup latency in web and desktop applications
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
A low power front-end for embedded processors using a block-aware instruction set
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Dynamic scratchpad memory management for code in portable systems with an MMU
ACM Transactions on Embedded Computing Systems (TECS)
Trace fragment selection within method-based JVMs
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Dynamic round-robin task scheduling to reduce cache misses for embedded systems
Proceedings of the conference on Design, automation and test in Europe
Dynamic and On-Line Design Space Exploration for Reconfigurable Architectures
Transactions on High-Performance Embedded Architectures and Compilers I
Blind Optimization for Exploiting Hardware Features
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Linux Kernel Compaction through Cold Code Swapping
Transactions on High-Performance Embedded Architectures and Compilers II
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Feedback-directed specialization of code
Computer Languages, Systems and Structures
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Novel online profiling for virtual machines
Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A hardware/software framework for instruction and data scratchpad memory allocation
ACM Transactions on Architecture and Code Optimization (TACO)
Multicore-aware hybrid code positioning to reduce worst-case execution time
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Improving TriMedia cache performance by profile guided code reordering
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Run-time randomization to mitigate tampering
IWSEC'07 Proceedings of the Security 2nd international conference on Advances in information and computer security
Code arrangement of embedded java virtual machine for NAND flash memory
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Studying microarchitectural structures with object code reordering
Proceedings of the Workshop on Binary Instrumentation and Applications
Evaluating the dynamic behaviour of Python applications
ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Fine-grain dynamic instruction placement for L0 scratch-pad memory
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Improved procedure placement for set associative caches
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Improving the performance of trace-based systems by false loop filtering
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Cache sensitive code arrangement for virtual machine
Transactions on high-performance embedded architectures and compilers III
Interpreter instruction scheduling
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Loaf: a framework and infrastructure for creating online adaptive solutions
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Reducing memory space consumption through dataflow analysis
Computer Languages, Systems and Structures
Improving performance through deep value profiling and specialization with code transformation
Computer Languages, Systems and Structures
Using platform-specific performance counters for dynamic compilation
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Efficient scratchpad allocation algorithms for energy constrained embedded systems
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Automatic code overlay generation and partially redundant code fetch elimination
ACM Transactions on Architecture and Code Optimization (TACO)
An automatic code overlaying technique for multicores with explicitly-managed memory hierarchies
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Combining code reordering and cache configuration
ACM Transactions on Embedded Computing Systems (TECS)
Survey of Low-Energy Techniques for Instruction Memory Organisations in Embedded Systems
Journal of Signal Processing Systems
Simple profile rectifications go a long way
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Post-compiler software optimization for reducing energy
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Reducing instruction fetch energy in multi-issue processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
This paper presents the results of our investigation of code positioning techniques using execution profile data as input into the compilation process. The primary objective of the positioning is to reduce the overhead of the instruction memory hierarchy.After initial investigation in the literature, we decided to implement two prototypes for the Hewlett-Packard Precision Architecture (PA-RISC). The first, built on top of the linker, positions code based on whole procedures. This prototype has the ability to move procedures into an order that is determined by a “closest is best” strategy.The second prototype, built on top of an existing optimizer package, positions code based on basic blocks within procedures. Groups of basic blocks that would be better as straight-line sequences are identified as chains. These chains are then ordered according to branch heuristics. Code that is never executed during the data collection runs can be physically separated from the primary code of a procedure by a technique we devised called procedure splitting.The algorithms we implemented are described through examples in this paper. The performance improvements from our work are also summarized in various tables and charts.