Unroll-and-jam using uniformly generated sets

  • Authors:
  • Steve Carr;Yiping Guan

  • Affiliations:
  • Department of Computer Science, Michigan Technological University, Houghton MI;Shafi Inc., 3637 Old US 23 Ste. 300, Brighton MI

  • Venue:
  • MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern architectural trends in instruction-level parallelism (ILP) are to increase the computational power of microprocessors significantly. As a result, the demands on memory have increased. Unfortunately, memory systems have not kept pace. Even hierarchical cache structures are ineffective if programs do not exhibit cache locality. Because of this compilers need to be concerned not only with finding ILP to utilize machine resources effectively, but also with ensuring that the resulting code has a high degree of cache locality. One compiler transformation that is essential for a compiler to meet the above objectives is unroll-and-jam, or outer-loop unrolling. Previous work either has used a dependence-based model to compute unroll amounts, significantly increasing the size of the dependence graph, or has applied a more brute force technique. In this paper, we present an algorithm that uses a linear-algebra-based technique to compute unroll amounts. This technique results in an 84% reduction over dependence-based techniques in the total number of dependences needed in our benchmark suite. Additionally, there is no loss in optimization performance over previous techniques and a more elegant solution is utilized.