MultiMaKe: Chip-multiprocessor driven memory-aware kernel pipelining

Authors:
Luis Angel D. Bathen;Yongjin Ahn;Sudeep Pasricha;Nikil D. Dutt
Affiliations:
University of California, Irvine, CA;University of California, Irvine, CA;Colorado State University, Fort Collins;University of California, Irvine, CA
Venue:
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Year:
2013

Citing 26
Cited 0

The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Hardware/software partitioning and pipelining

DAC '97 Proceedings of the 34th annual Design Automation Conference
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
RECOD: a retiming heuristic to optimize resource and memory utilization in HW/SW codesigns

Proceedings of the 6th international workshop on Hardware/software codesign
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Loop tiling for parallelism

Loop tiling for parallelism
A constructive algorithm for memory-aware task assignment and scheduling

Proceedings of the ninth international symposium on Hardware/software codesign
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Data Reuse Analysis Technique for Software-Controlled Memory Hierarchies

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Layer Assignment echniques for Low Energy in Multi-Layered Memory Organisations

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
FORAY-GEN: Automatic Generation of Affine Functions for Memory Optimizations

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Data partitioning for maximal scratchpad usage

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies

Proceedings of the 43rd annual Design Automation Conference
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
The pipeline decomposition tree:: an analysis tool for multiprocessor implementation of image processing applications

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Heterogeneous multiprocessor implementations for JPEG:: a case study

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Integrated scratchpad memory optimization and task scheduling for MPSoC architectures

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Design methodology for pipelined heterogeneous multiprocessor system

Proceedings of the 44th annual Design Automation Conference
SoCDAL: System-on-chip design AcceLerator

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler driven data layout optimization for regular/irregular array access patterns

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Pipelined data parallel task mapping/scheduling technique for MPSoC

Proceedings of the Conference on Design, Automation and Test in Europe
Macro pipelining based scheduling on high performance heterogeneousmultiprocessor systems

IEEE Transactions on Signal Processing
Quantum-inspired evolutionary algorithm for a class of combinatorial optimization

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing demand for low-power and high-performance multimedia embedded systems has motivated the need for effective solutions to satisfy application bandwidth and latency requirements under a tight power budget. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose MultiMaKe, an application mapping design flow capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels, early execution edges to drive performance, and kernel pipelining to increase system throughput. Our experimental results on JPEG and JPEG2000 show up to 97% off-chip memory access reduction, and up to 80% execution time reduction over standard mapping and task-level pipelining approaches.