MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs

Authors:
Chao Wang;Xi Li;Junneng Zhang;Xuehai Zhou;Xiaoning Nie
Affiliations:
University of Science and Technology of China;Suzhou Institute for University of Science and Technology of China;Suzhou Institute for University of Science and Technology of China;University of Science and Technology of China;Intel
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 27
Cited 0

HPS, a new microarchitecture: rationale and introduction

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The Stanford Hydra CMP

IEEE Micro
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A Multilevel Computing Architecture for Embedded Multimedia Applications

IEEE Micro
The MOLEN Processor Prototype

FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
RAMP: Research Accelerator for Multiple Processors

IEEE Micro
A distributed, simultaneously multi-threaded (SMT) processor with clustered scheduling windows for scalable DSP performance

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Serialization sets: a dynamic dependence-based parallel execution model

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic MIPS rate stabilization in out-of-order processors

Proceedings of the 36th annual international symposium on Computer architecture
An efficient algorithm for exploiting multiple arithmetic units

IBM Journal of Research and Development
Automatic Parallelization in a Binary Rewriter

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Task Superscalar: An Out-of-Order Task Pipeline

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfigurable Fabric

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The future of microprocessors

Communications of the ACM
SOMP: Service-Oriented Multi Processors

SCC '11 Proceedings of the 2011 IEEE International Conference on Services Computing
Dynamic Fine-Grain Scheduling of Pipeline Parallelism

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Dataflow execution of sequential imperative programs on multicore architectures

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Platune: a tuning framework for system-on-a-chip platforms

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
FPM: A Flexible Programming Model for MPSoC on FPGA

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents MP-Tomasulo, a dependency-aware automatic parallel task execution engine for sequential programs. Applying the instruction-level Tomasulo algorithm to MPSoC environments, MP-Tomasulo detects and eliminates Write-After-Write (WAW) and Write-After-Read (WAR) inter-task dependencies in the dataflow execution, therefore to operate out-of-order task execution on heterogeneous units. We implemented the prototype system within a single FPGA. Experimental results on EEMBC applications demonstrate that MP-Tomasulo can execute the tasks out-of-order to achieve as high as 93.6% to 97.6% of ideal peak speedup. A comparative study against a state-of-the-art dataflow execution scheme is illustrated with a classic JPEG application. The promising results show MP-Tomasulo enables programmers to uncover more task-level parallelism on heterogeneous systems, as well as to ease the burden of programmers.