The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching

  • Authors:
  • Rafael H. Saavedra-Barrera;Weihua Mao;Daeyeon Park;Jacqueline Chame;Sungdo Moon

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
  • Year:
  • 1996

Quantified Score

Hi-index 0.01

Visualization

Abstract

Unimodular transformations, tiling, and software prefetching are loop optimizations known to be effective in increasing parallelism, reducing cache miss rates, and eliminating processor stall time. Although these optimizations individually are quite effective, there is the expectation that even better improvements can be obtained by combining them together. In this paper we show that indeed this is the case when unimodular transformations are combined with either tiling or software prefetching. However, our results also show that although combining tiling with prefetching tends to improve the performance of tiling alone, it is also the case that in some situations tiling can degrade the cache performance of software prefetching. The reasons for this unexpected behavior are three fold: 1) tiling introduces interference misses inside the localized space which are difficult to characterize with current techniques based on locality analysis; 2) prefetch predicates are computed using only estimates on the amount of capacity misses, so the latency induced by cache interference is not completely covered; and 3) tiling limits the maximum amount of latency that can be masked with prefetching.