Hardware/software support for adaptive work-stealing in on-chip multiprocessor

  • Authors:
  • Quentin Meunier;Frédéric Pétrot;Jean-Louis Roch

  • Affiliations:
  • TIMA Laboratory, INP Grenoble, 46, avenue Félix Viallet, 38031 Grenoble Cedex, France;TIMA Laboratory, INP Grenoble, 46, avenue Félix Viallet, 38031 Grenoble Cedex, France;LIG, INP Grenoble and INRIA, 51, avenue Jean Kuntzmann, 38330 Montbonnot Saint-Martin, France

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

During the past few years, embedded digital systems have been requested to provide a huge amount of processing power and functionality. A very likely foreseeable step to pursue this computational and flexibility trend is the generalization of on-chip multiprocessor platforms (MPSoC). In that context, choosing a programming model and providing optimized hardware support to it on these platforms is a challenging task. To deal in a portable way with MPSoCs having a different number of processors running possibly at different frequencies, work-stealing (WS) based parallelization is a current research trend. The contribution of this paper is to evaluate the impact of some simple MPSoCs' architecture characteristics on the performance of WS in the MPSoC context. The previous evaluations of WS, either theoretical or experimental, were done on fixed multicores architectures. This work extends these studies by exploring the use of WS for the codesign of embedded applications on MPSoC platforms with different hardware capabilities, thanks to cycle-accurate measures. We firstly study the architectural choices suited to WS algorithms and measure the benefit of these architectural modifications. To assert whether WS is suited to the MPSoC context, we experimentally measure its intrinsic implementation overhead on the most efficient architectural designs. Finally, we validate the performances of the approach on two real applications: a regular multimedia application (temporal noise reduction) and an irregular computation intensive application (frames of the Mandelbrot set). Our results show that enhancing MPSoC platforms having up to 16 processors with widespread hardware support mechanisms can lead to important performance improvements at acceptable hardware cost for the considered applications.