Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Performance issues in correlated branch prediction schemes
Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Applications of randomness in system performance measurement
Applications of randomness in system performance measurement
Analyzing the working set characteristics of branch execution
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Better global scheduling using path profiles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Overlapping execution with transfer using non-strict execution for mobile programs
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
ICS '99 Proceedings of the 13th international conference on Supercomputing
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Static correlated branch prediction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS)
Offline program re-mapping to improve branch prediction efficiency in embedded systems
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Code layout optimizations for transaction processing workloads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Compiler-Directed Collective-I/O
IEEE Transactions on Parallel and Distributed Systems
Software Trace Cache for Commercial Applications
International Journal of Parallel Programming
Design and Evaluation of a Compiler-Directed Collective I/O Technique
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Collective I/O Scheme Based on Compiler Analysis
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiling for instruction cache performance on a multithreaded architecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Continuous program optimization: A case study
ACM Transactions on Programming Languages and Systems (TOPLAS)
Call graph prefetching for database applications
ACM Transactions on Computer Systems (TOCS)
A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
IEEE Transactions on Computers
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache
Proceedings of the international symposium on Code generation and optimization
A first look at the interplay of code reordering and configurable caches
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Restructuring field layouts for embedded memory systems
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Cache-conscious coallocation of hot data streams
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Modeling instruction placement on a spatial architecture
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Procedure placement using temporal-ordering information: Dealing with code size expansion
Journal of Embedded Computing - Cache exploitation in embedded systems
Optimal task placement to improve cache performance
EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
Curl: a language for web content
International Journal of Web Engineering and Technology
Blind Optimization for Exploiting Hardware Features
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Improving TriMedia cache performance by profile guided code reordering
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Proceedings of the international conference on Supercomputing
Automatic code overlay generation and partially redundant code fetch elimination
ACM Transactions on Architecture and Code Optimization (TACO)
An automatic code overlaying technique for multicores with explicitly-managed memory hierarchies
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Combining code reordering and cache configuration
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Instruction cache performance is very important to instruction fetch efficiency and overall processor performance. The layout of an executable has a substantial effect on the cache miss rate during execution. This means that the performance of an executable can be improved significantly by applying a code-placement algorithm that minimizes instruction cache conflicts. We describe an algorithm for procedure placement, one type of code-placement algorithm, that significantly differs from previous approaches in the type of information used to drive the placement algorithm. In particular, we gather temporal ordering information that summarizes the interleaving of procedures in a program trace. Our algorithm uses this information along with cache configuration and procedure size information to better estimate the conflict cost of a potential procedure ordering. We compare the performance of our algorithm with previously published procedure-placement algorithms and show noticeable improvements in the instruction cache behavior.