Prefetching irregular references for software cache on cell

  • Authors:
  • Tong Chen;Tao Zhang;Zehra Sura;Mar Gonzales Tallada

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

  • Venue:
  • Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The IBM Single Source Research Compiler for the Cell processor (the SSC Research Compiler) was developed to manage the complexity of programming the heterogeneous multicore Cell processor. The compiler accepts conventional source programs as input, and automatically generates binaries that execute on both the PPU and SPU cores available on a Cell chip. The compiler uses a software cache and direct buffers to manage data in the small local memory of SPUs. However, irregular references, such as a[ind[i]], often become performance bottle-necks. These references are accessed through software cache, usually with high miss rates. To solve this problem, we propose a method to prefetch irregular references accessed through a software cache that is built upon hardware such as Cell. This method includes code transformation in the compiler and a runtime library component for the software cache. Our design simplifies the synchronization required when prefetching into software cache, overlaps DMA operations for misses, and avoids frequent context switching to the miss handler. It also minimizes the cache pollution caused by prefetching, by looking both forwards and backwards through the sequence of addresses to be prefetched. We evaluated our prefetching method using the NAS benchmarks. We found that when applicable, our prefetching can improve the performance of some benchmarks by 2 times on average, and by close to 4 times in the best case. We also present data to show the impact of different configurations and optimizations when prefetching in a software cache.