An automatic overlay generator
IBM Journal of Research and Development
A Case for Direct-Mapped Caches
Computer
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Expected I-cache miss rates via the gap model
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
An inter-reference gap model for temporal locality in program behavior
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Hot cold optimization of large Windows/NT applications
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Wrong-path instruction prefetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Performance analysis using very large memory on the 64-bit AlphaServer system
Digital Technical Journal
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Better global scheduling using path profiles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Improving locality by critical working sets
Communications of the ACM
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Temporal-Based Procedure Reordering for Improved Instruction Cache Performance
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Applications of randomness in system performance measurement
Applications of randomness in system performance measurement
Improving instruction locality with just-in-time code layout
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
The hardness of cache conscious data placement
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Code Positioning for VLIW Architectures
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Improving spatial locality of programs via data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Buffering databse operations for enhanced instruction cache performance
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Procedure placement using temporal-ordering information: dealing with code size expansion
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Code placement for improving dynamic branch prediction accuracy
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A hierarchical model of data locality
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A novel instruction scratchpad memory optimization method based on concomitance metric
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
The Camino Compiler infrastructure
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Fast and efficient partial code reordering: taking advantage of dynamic recompilatior
Proceedings of the 5th international symposium on Memory management
Dynamic code management: improving whole program code locality in managed runtimes
Proceedings of the 2nd international conference on Virtual execution environments
The hardness of cache conscious data placement
Nordic Journal of Computing
Procedure placement using temporal-ordering information: Dealing with code size expansion
Journal of Embedded Computing - Cache exploitation in embedded systems
Dynamic round-robin task scheduling to reduce cache misses for embedded systems
Proceedings of the conference on Design, automation and test in Europe
Abstracting access patterns of dynamic memory using regular expressions
ACM Transactions on Architecture and Code Optimization (TACO)
HitME: low power Hit MEmory buffer for embedded systems
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Linux Kernel Compaction through Cold Code Swapping
Transactions on High-Performance Embedded Architectures and Compilers II
Layout transformations for heap objects using static access patterns
CC'07 Proceedings of the 16th international conference on Compiler construction
Studying microarchitectural structures with object code reordering
Proceedings of the Workshop on Binary Instrumentation and Applications
Instruction cache locking using temporal reuse profile
Proceedings of the 47th Design Automation Conference
Fine-grain dynamic instruction placement for L0 scratch-pad memory
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Improved procedure placement for set associative caches
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Steganography for executables and code transformation signatures
ICISC'04 Proceedings of the 7th international conference on Information Security and Cryptology
Hi-index | 0.00 |
Instruction cache performance is important to instruction fetch efficiency and overall processor performance. The layout of an executable has a substantial effect on the cache miss rate and the instruction working set size during execution. This means that the performance of an executable can be improved by applying a code-placement algorithm that minimizes instruction cache conflicts and improves spatial locality. We describe an algorithm for procedure placement, one type of code placement, that signicantly differs from previous approaches in the type of information used to drive the placement algorithm. In particular, we gather temporal-ordering information that summarizes the interleaving of procedures in a program trace. Our algorithm uses this information along with cache configuration and procedure size information to better estimate the conflict cost of a potential procedure ordering. It optimizes the procedure placement for single level and multilevel caches. In addition to reducing instruction cache conflicts, the algorithm simultaneously minimizes the instruction working set size of the program. We compare the performance of our algorithm with a particularly successful procedure-placement algorithm and show noticeable improvements in the instruction cache behavior, while maintaining the same instruction working set size.