Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions

  • Authors:
  • Apan Qasem

  • Affiliations:
  • Texas State University, San Marcos, TX

  • Venue:
  • Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a strategy that integrates a set of compiler optimizations and analysis techniques that enable the detection and transformation of time step loops for efficient execution on multicore platforms. Time-step computations, which appear frequently in scientific applications, are amenable to pipelined parallelism and exhibit a high degree of temporal locality. However, striking the right balance between data locality and parallelism often proves difficult, particularly for current multicore architectures where one or more levels of the memory hierarchy is shared among multiple processing units. Our proposed strategy addresses performance issues related to both data locality and parallelism. By carefully orchestrating a set of source-to-source transformations, our technique exposes fine-grain parallelism within a time-step loop, while improving its cache utilization and reducing its bandwidth requirements. Preliminary experiments with two time-step applications on three multicore platforms show that that the code variants generated by our strategy have significantly fewer misses in the shared caches and also achieve better execution times through reduced synchronization costs.