Optimizing the Instruction Cache Performance of the Operating System

Authors:
Joseph Torrellas;Chun Xia;Russell L. Daigle
Affiliations:
Univ. of Illinois at Urbana-Champaign, Urbana;BrightInfo;Tandem Computers, Inc.
Venue:
IEEE Transactions on Computers
Year:
1998

Citing 19
Cited 6

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
The VMP multiprocessor: initial experience, refinements, and performance evaluation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Trace selection for compiling large C application programs to microcode

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Improving instruction cache behavior by reducing cache pollution

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Procedure merging with instruction caches

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Characterizing the caching and synchronization performance of a multiprocessor operating system

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Ordering functions for improving memory reference locality in a shared memory multiprocessor system

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Compile time instruction cache optimizations

ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Optimal allocation of on-chip memory for multiple-API operating systems

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trace-directed program restructuring for AIX executables

IBM Journal of Research and Development
Cache Performance in the VAX-11/780

ACM Transactions on Computer Systems (TOCS)
The Effect of Code Expanding Optimizations on Instruction Cache Design

IEEE Transactions on Computers
Code Reorginazation for Instruction Caches

Code Reorginazation for Instruction Caches

Instruction prefetching of systems codes with layout optimized for reduced cache misses

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications

MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Optimizing instruction cache performance of embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Performance/area efficiency in chip multiprocessors with micro-caches

Proceedings of the 4th international conference on Computing frontiers
Virtually split cache: An efficient mechanism to distribute instructions and data

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	14.98

Visualization

Abstract

High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to minimize cache interference by improving the layout of the basic blocks of the code. However, the performance impact of this technique has been reported for application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. It is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes, in detail, the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: Rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. Based on our observations, we propose an algorithm to expose these localities and reduce interference in the cache. For a range of cache sizes, associativities, lines sizes, and organizations, we show that we reduce total instruction miss rates by 31-86 percent, or up to 2.9 absolute points. Using a simple model, this corresponds to execution time reductions of the order of 10-25 percent. In addition, our optimized operating system combines well with optimized and unoptimized applications.