Maximizing Multiprocessor Performance with the SUIF Compiler

  • Authors:
  • Mary W. Hall;Jennifer M. Anderson;Saman P. Amarasinghe;Brian R. Murphy;Shih-Wei Liao;Edouard Bugnion;Monica S. Lam

  • Affiliations:
  • -;-;-;-;-;-;-

  • Venue:
  • Computer
  • Year:
  • 1996

Quantified Score

Hi-index 4.11

Visualization

Abstract

Multiple processors can work together to speed up single applications, but sequential programs must be rewritten to take advantage of the extra processors. One way to do this is through automatic parallelization with a compiler. Multiprocessors pose especially challenging problems for parallelizing compilers. Sufficient work must be performed in parallel to overcome processor synchronization and communication overhead. Moreover, multiprocessor memory hierarchies are complex, containing both shared memory and multiple levels of cache memory. Thus, two techniques are essential in obtaining good multiprocessor performance for array-based numerical programs: locating coarse-grain parallelism and managing multiprocessor memory use. The authors describe new technology in the Stanford SUIF compiler that enables it to successfully carry out these techniques. First, a suite of robust analysis techniques operate across procedure boundaries to locate coarse-grain parallelism so that large computations can execute independently in parallel. Then, to help eliminate cache misses, affine partitioning is used to improve processor reuse of data, and data permutation and data strip-mining make contiguous the data accessed by each processor in the shared address space. When employed in the automatic parallelizing compiler, these techniques significantly affect the performance of half the NAS and SPECfp95 benchmark suites.