Combining Optimization for Cache and Instruction-Level Parallelism

  • Authors:
  • Steve Carr

  • Affiliations:
  • -

  • Venue:
  • PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current architectural trends in instruction-level parallelism (ILP) have significantly increased the computational power of microprocessors. As a result, the demands on the memory system have increased dramatically. Not only do compilers need to be concerned with finding ILP to utilize machine resources effectively, but they also need to be concerned with ensuring that the resulting code has a high degree of cache locality. Previous work has concentrated either on improving ILP in nested loops or on improving cache performance. This paper presents a performance metric that can be used to guide the optimization of nested loops considering the combined effects of ILP, data reuse and latency hiding techniques. Preliminary experiments reveal that dramatic performance improvements for nested loops are obtainable (we regularly get at least a factor of 2 on kernels run on two different architectures).