Hardware/software support for adaptive work-stealing in on-chip multiprocessor

Authors:
Quentin Meunier;Frédéric Pétrot;Jean-Louis Roch
Affiliations:
TIMA Laboratory, INP Grenoble, 46, avenue Félix Viallet, 38031 Grenoble Cedex, France;TIMA Laboratory, INP Grenoble, 46, avenue Félix Viallet, 38031 Grenoble Cedex, France;LIG, INP Grenoble and INRIA, 51, avenue Jean Kuntzmann, 38330 Montbonnot Saint-Martin, France
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2010

Citing 28
Cited 0

Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
YAPI: application modeling for signal processing systems

Proceedings of the 37th Annual Design Automation Conference
System-on-a-chip processor synchronization support in hardware

Proceedings of the conference on Design, automation and test in Europe
A practical tool box for system level communication synthesis

Proceedings of the ninth international symposium on Hardware/software codesign
StepNP: A System-Level Exploration Platform for Network Processors

IEEE Design & Test
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Computational graceful degradation for video sequence decoding

ICIP '97 Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1
Lightweight Implementation of the POSIX Threads API for an On-Chip MIPS Multiprocessor with VCI Interconnect

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum - Volume 2
Hardware/Software Interface Codesign for Embedded Systems

Computer
picoArray Technology: The Tool's Story

Proceedings of the conference on Design, Automation and Test in Europe - Volume 3
Methods for evaluating and covering the design space during early design development

Integration, the VLSI Journal
The Challenges for High Performance Embedded Systems

DSD '06 Proceedings of the 9th EUROMICRO Conference on Digital System Design
LMPI: MPI for Heterogeneous Embedded Distributed Systems

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Concurrent programming without locks

ACM Transactions on Computer Systems (TOCS)
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Processor-Oblivious Parallel Stream Computations

PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
A practical approach for reconciling high and predictable performance in non-regular parallel programs

Proceedings of the conference on Design, automation and test in Europe
Comparison of memory write policies for NoC based multicore cache coherent systems

Proceedings of the conference on Design, automation and test in Europe
OpenMP-based parallelization on an MPCore multiprocessor platform - A performance and power analysis

Journal of Systems Architecture: the EUROMICRO Journal
Multisynchronous and Fully Asynchronous NoCs for GALS Architectures

IEEE Design & Test
Backtracking-based load balancing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
OpenMP and compilation issue in embedded applications

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

During the past few years, embedded digital systems have been requested to provide a huge amount of processing power and functionality. A very likely foreseeable step to pursue this computational and flexibility trend is the generalization of on-chip multiprocessor platforms (MPSoC). In that context, choosing a programming model and providing optimized hardware support to it on these platforms is a challenging task. To deal in a portable way with MPSoCs having a different number of processors running possibly at different frequencies, work-stealing (WS) based parallelization is a current research trend. The contribution of this paper is to evaluate the impact of some simple MPSoCs' architecture characteristics on the performance of WS in the MPSoC context. The previous evaluations of WS, either theoretical or experimental, were done on fixed multicores architectures. This work extends these studies by exploring the use of WS for the codesign of embedded applications on MPSoC platforms with different hardware capabilities, thanks to cycle-accurate measures. We firstly study the architectural choices suited to WS algorithms and measure the benefit of these architectural modifications. To assert whether WS is suited to the MPSoC context, we experimentally measure its intrinsic implementation overhead on the most efficient architectural designs. Finally, we validate the performances of the approach on two real applications: a regular multimedia application (temporal noise reduction) and an irregular computation intensive application (frames of the Mandelbrot set). Our results show that enhancing MPSoC platforms having up to 16 processors with widespread hardware support mechanisms can lead to important performance improvements at acceptable hardware cost for the considered applications.