Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
The VMP multiprocessor: initial experience, refinements, and performance evaluation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Trace selection for compiling large C application programs to microcode
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The interaction of architecture and operating system design
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Improving instruction cache behavior by reducing cache pollution
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Procedure merging with instruction caches
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Characterizing the caching and synchronization performance of a multiprocessor operating system
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Ordering functions for improving memory reference locality in a shared memory multiprocessor system
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Compile time instruction cache optimizations
ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Optimal allocation of on-chip memory for multiple-API operating systems
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trace-directed program restructuring for AIX executables
IBM Journal of Research and Development
Cache Performance in the VAX-11/780
ACM Transactions on Computer Systems (TOCS)
The Effect of Code Expanding Optimizations on Instruction Cache Design
IEEE Transactions on Computers
Code Reorginazation for Instruction Caches
Code Reorginazation for Instruction Caches
Instruction prefetching of systems codes with layout optimized for reduced cache misses
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Performance/area efficiency in chip multiprocessors with micro-caches
Proceedings of the 4th international conference on Computing frontiers
Virtually split cache: An efficient mechanism to distribute instructions and data
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 14.98 |
High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to minimize cache interference by improving the layout of the basic blocks of the code. However, the performance impact of this technique has been reported for application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. It is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes, in detail, the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: Rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. Based on our observations, we propose an algorithm to expose these localities and reduce interference in the cache. For a range of cache sizes, associativities, lines sizes, and organizations, we show that we reduce total instruction miss rates by 31-86 percent, or up to 2.9 absolute points. Using a simple model, this corresponds to execution time reductions of the order of 10-25 percent. In addition, our optimized operating system combines well with optimized and unoptimized applications.