Models for generating locality-tuned traveling threads for a hierarchical multi-level heterogeneous multicore

  • Authors:
  • Patrick Anthony La Fratta;Peter M. Kogge

  • Affiliations:
  • University of Notre Dame, Notre Dame, IN, USA;University of Notre Dame, Notre Dame, IN, USA

  • Venue:
  • Proceedings of the 7th ACM international conference on Computing frontiers
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As heterogeneous multicore processors become more widespread, many options are emerging for producing efficient parallel code for such processors. Although parallel programming languages are improving, manual partitioning of computations and data across heterogeneous processing resources is proving extraordinarily difficult. Further, it is becoming increasingly important to consider locality when producing parallel code, as data transport is a primary source of performance overhead and energy consumption. To address these problems, we propose a novel model for extracting parallel computations from sequential code for a hierarchical multi-level heterogeneous processor which we present called the Passive/Active Multicore (PAM). The computations take the form of short, fine-grained threads, which are generated with consideration to locality through cache profiling and have the ability to migrate from core to core up through the memory hierarchy based on the location of operands. Experimental results across both integer and floating point intensive standard and scientific workloads show that the architecture, execution model, and computational extraction techniques together offer computational offloads of up to 24% (5.8% on average). Through simulation, we estimate these offloads may translate into speedups of up to 19% (4.0% on average) and that negative effects on performance are negligible. Floating point applications seem to be most aided by these techniques.