Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

  • Authors:
  • Michael Chu;Rajiv Ravindran;Scott Mahlke

  • Affiliations:
  • -;-;-

  • Venue:
  • Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The recent design shift towards multicore processors has spawned a significant amount of research in the area of program paralleliza- tion. The future abundance of cores on a single chip requires pro- grammer and compiler intervention to increase the amount of par- allel work possible. Much of the recent work has fallen into the areas of coarse-grain parallelization: new programming models and different ways to exploit threads and data-level parallelism. This work focuses on a complementary direction, improving per- formance through automated fine-grain parallelization. The main difficulty in achieving a performance benefit from fine-grain paral- lelism is the distribution of data memory accesses across the data caches of each core. Poor choices in the placement of data ac- cesses can lead to increased memory stalls and low resource utiliza- tion. We propose a profile-guided method for partitioning mem- ory accesses across distributed data caches. First, a profile deter- mines affinity relationships between memory accesses and work- ing set characteristics of individual memory operations in the pro- gram. Next, a program-level partitioning of the memory opera- tions is performed to divide the memory accesses across the data caches. As a result, the data accesses are proactively dispersed to reduce memory stalls and improve computation parallelization. A final detailed partitioning of the computation instructions is per- formed with knowledge of the cache location of their associated data. Overall, our data partitioning reduces stall cycles by up to 51% versus data-incognizant partitioning, and has an overall speedup average of 30% over a single core processor.