Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems
IEEE Transactions on Software Engineering
Optimal Sequential Partitions of Graphs
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Improving locality by critical working sets
Communications of the ACM
Code Reorginazation for Instruction Caches
Code Reorginazation for Instruction Caches
Automatic storage optimization.
Automatic storage optimization.
Cache management by the compiler
Cache management by the compiler
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The Evolution of Instruction Sequencing
Computer - Special issue on instruction sequencing
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Improving instruction cache behavior by reducing cache pollution
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Predicting program behavior using real or estimated profiles
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Procedure merging with instruction caches
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Experience with a software-defined machine architecture
ACM Transactions on Programming Languages and Systems (TOPLAS)
Fast instruction cache performance evaluation using compile-time analysis
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Cache replacement with dynamic exclusion
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Characterizing the caching and synchronization performance of a multiprocessor operating system
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Prefetching in supercomputer instruction caches
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Efficient software-based fault isolation
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Accurate static estimators for program optimization
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Compile time instruction cache optimizations
ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Static branch frequency and program profile analysis
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Trace-directed program restructuring for AIX executables
IBM Journal of Research and Development
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Accurate static branch prediction by value range propagation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Next cache line and set prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction prefetching of systems codes with layout optimized for reduced cache misses
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Hot cold optimization of large Windows/NT applications
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Generating efficient protocol code from an abstract specification
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Analysis of techniques to improve protocol processing latency
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Speeding up protocols for small messages
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Predictability of load/store instruction latencies
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Generating efficient protocol code from an abstract specification
IEEE/ACM Transactions on Networking (TON)
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Code placement techniques for cache miss rate reduction
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A Performance Study of Instruction Cache Prefetching Methods
IEEE Transactions on Computers
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Segregating heap objects by reference behavior and lifetime
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Optimizing the Instruction Cache Performance of the Operating System
IEEE Transactions on Computers
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Application-specific memory management for embedded systems using software-controlled caches
Proceedings of the 37th Annual Design Automation Conference
Efficient and Precise Cache Behavior Prediction for Real-TimeSystems
Real-Time Systems
New directions in compiler technology for embedded systems (embedded tutorial)
Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automated design of finite state machine predictors for customized processors
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Code layout optimizations for transaction processing workloads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Software-assisted cache replacement mechanisms for embedded systems
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Evaluation of Neural and Genetic Algorithms for Synthesizing Parallel Storage Schemes
International Journal of Parallel Programming
The Effect of Code Expanding Optimizations on Instruction Cache Design
IEEE Transactions on Computers
Code Positioning for VLIW Architectures
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Compiling for instruction cache performance on a multithreaded architecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimal Code Placement of Embedded Software for Instruction Caches
EDTC '96 Proceedings of the 1996 European conference on Design and Test
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Size-Constrained Code Placement for Cache Miss Rate Reduction
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
Profile guided code positioning
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Predicting program behavior using real or estimated profiles
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Array organization in parallel memories
International Journal of Parallel Programming
Profile-directed restructuring of operating system code
IBM Systems Journal
A first look at the interplay of code reordering and configurable caches
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Code placement for improving dynamic branch prediction accuracy
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Frequency-based code placement for embedded multiprocessors
Proceedings of the 42nd annual Design Automation Conference
A non-uniform cache architecture for low power system design
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Instruction code mapping for performance increase and energy reduction in embedded computer systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A novel instruction scratchpad memory optimization method based on concomitance metric
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Multi-parametric improvements for embedded systems using code-placement and address bus coding
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
The Camino Compiler infrastructure
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
A cache-defect-aware code placement algorithm for improving the performance of processors
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Fast and efficient partial code reordering: taking advantage of dynamic recompilatior
Proceedings of the 5th international symposium on Memory management
Whole-program optimization of global variable layout
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Compiler optimization to improve data locality for processor multithreading
Scientific Programming
Code reordering on limited branch offset
ACM Transactions on Architecture and Code Optimization (TACO)
Memory behavior of an X11 window system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Optimizing the performance of dynamically-linked programs
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
A low power front-end for embedded processors using a block-aware instruction set
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Blind Optimization for Exploiting Hardware Features
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Multicore-aware hybrid code positioning to reduce worst-case execution time
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Improving TriMedia cache performance by profile guided code reordering
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Studying microarchitectural structures with object code reordering
Proceedings of the Workshop on Binary Instrumentation and Applications
Code and Data Placement for Embedded Processors with Scratchpad and Cache Memories
Journal of Signal Processing Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Compartmental memory management in a modern web browser
Proceedings of the international symposium on Memory management
Reducing memory space consumption through dataflow analysis
Computer Languages, Systems and Structures
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Combining code reordering and cache configuration
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.01 |
This paper presents an optimization algorithm for reducing instruction cache misses. The algorithm uses profile information to reposition programs in memory so that a direct-mapped cache behaves much like an optimal cache with full associativity and full knowledge of the future. For best results, the cache should have a mechanism for excluding certain instructions designated by the compiler. This paper first presents a reduced form of the algorithm. This form is shown to produce an optimal miss rate for programs without conditionals and with a tree call graph, assuming basic blocks can be reordered at will. If conditionals are allowed, but there are no loops within conditionals, the algorithm does as well as an optimal cache for the worst case execution of the program consistent with the profile information. Next, the algorithm is extended with heuristics for general programs. The effectiveness of these heuristics are demonstrated with empirical results for a set of 10 programs for various cache sizes. The improvement depends on cache size. For a 512 word cache, miss rates for a direct-mapped instruction cache are halved. For an 8K word cache, miss rates fall by over 75%. Over a wide range of cache sizes the algorithm is as effective as increasing the cache size by a factor of 3 times. For 512 words, the algorithm generates only 32% more misses than an optimal cache. Optimized programs on a direct-mapped cache have lower miss rates than unoptimized programs on set-associative caches of the same size.