A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops

  • Authors:
  • Chi-Keung Luk;Ryan Newton;William Hasenplaugh;Mark Hampton;Geoff Lowney

  • Affiliations:
  • Intel;Intel;Intel;Intel;Intel

  • Venue:
  • IEEE Software
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the era of multicores, many applications that require substantial computing power and data crunching can now run on desktop PCs. However, to achieve the best possible performance, developers must write applications in a way that exploits both parallelism and cache locality. This article proposes one such approach for x86-based architectures that uses cache-oblivious techniques to divide a large problem into smaller subproblems, which are mapped to different cores or threads. The authors then use the compiler to exploit SIMD parallelism within each subproblem. Finally, they use autotuning to pick the best parameter values throughout the optimization process. The authors have implemented this approach with the Intel compiler and the newly developed Intel Software Autotuning Tool. Experimental results collected on a dual-socket quad-core Nehalem show that the approach achieves an average speed up of almost 20x over the best serial cases for an important set of computational kernels.