Instruction prefetching of systems codes with layout optimized for reduced cache misses

  • Authors:
  • Chun Xia;Josep Torrellas

  • Affiliations:
  • Sun Microsystems, Inc. and Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, IL;Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, IL

  • Venue:
  • ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-performing on-chip instruction caches are crucial to keep fast processors busy. Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches in loop-intensive engineering codes, they are less able to do so in large systems codes. To improve the performance of the latter codes, the compiler can be used to lay out the code in memory for reduced cache conflicts. Interestingly, such an operation leaves the code in a state that can be exploited by a new type of instruction prefetching: guarded sequential prefetching.The idea is that the compiler leaves hints in the code as to how the code was laid out. Then, at run time, the prefetching hardware detects these hints and uses them to prefetch more effectively. This scheme can be implemented very cheaply: one bit encoded in control transfer instructions and a prefetch module that requires minor extensions to existing next-line sequential prefetchers. Furthermore, the scheme can be turned off and on at run time with the toggling of a bit in the TLB. The scheme is evaluated with simulations using complete traces from a 4-processor machine. Overall, for 16-Kbyte primary instruction caches, guarded sequential prefetching removes, on average, 66% of the instruction misses remaining in an operating system with an optimized layout, speeding up the operating system by 10%. Moreover, the scheme is more cost-effective and robust than existing sequential prefetching techniques.