Improving parallelism and locality with asynchronous algorithms

  • Authors:
  • Lixia Liu;Zhiyuan Li

  • Affiliations:
  • Purdue University, West Lafayette, USA;Purdue University, West Lafayette, USA

  • Venue:
  • Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As multicore chips become the main building blocks for high performance computers, many numerical applications face a performance impediment due to the limited hardware capacity to move data between the CPU and the off-chip memory. This is especially true for large computing problems solved by iterative algorithms because of the large data set typically used. Loop tiling, also known as loop blocking, was shown previously to be an effective way to enhance data locality, and hence to reduce the memory bandwidth pressure, for a class of iterative algorithms executed on a single processor. Unfortunately, the tiled programs suffer from reduced parallelism because only the loop iterations within a single tile can be easily parallelized. In this work, we propose to use the asynchronous model to enable effective loop tiling such that both parallelism and locality can be attained simultaneously. Asynchronous algorithms were previously proposed to reduce the communication cost and synchronization overhead between processors. Our new discovery is that carefully controlled asynchrony and loop tiling can significantly improve the performance of parallel iterative algorithms on multicore processors due to simultaneously attained data locality and loop-level parallelism. We present supporting evidence from experiments with three well-known numerical kernels.