Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications

Authors:
Hyunchul Park;Yongjun Park;Scott Mahlke
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2009

Citing 25
Cited 7

Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
A stream compiler for communication-exposed architectures

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
The MorphoSys Parallel Reconfigurable System

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Mapping applications to the RaPiD configurable architecture

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A Scalable Implementation of a Reconfigurable WCDMA Rake Receiver

Proceedings of the conference on Design, automation and test in Europe - Volume 3
Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A Distributed Control Path Architecture for VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
SODA: A Low-power Architecture For Software Radio

Proceedings of the 33rd annual international symposium on Computer Architecture
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Vector processing as an enabler for software-defined radio in handheld devices

EURASIP Journal on Applied Signal Processing
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Composable Lightweight Processors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Orchestrating the execution of stream programs on multicore platforms

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
VEAL: Virtualized Execution Accelerator for Loops

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
From SODA to scotch: The evolution of a wireless baseband processor

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Tradeoffs in designing accelerator architectures for visual computing

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture

Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Resource recycling: putting idle resources to work on a composable accelerator

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Improving performance of nested loops on reconfigurable array processors

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
An out-of-order vector processing mechanism for multimedia applications

Proceedings of the 9th conference on Computing Frontiers
Architecture support for accelerator-rich CMPs

Proceedings of the 49th Annual Design Automation Conference
Exploiting both pipelining and data parallelism with SIMD reconfigurable architecture

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mobile computing in the form of smart phones, netbooks, and personal digital assistants has become an integral part of our everyday lives. Moving ahead to the next generation of mobile devices, we believe that multimedia will become a more critical and product-differentiating feature. High definition audio and video as well as 3D graphics provide richer interfaces and compelling capabilities. However, these algorithms also bring different computational challenges than wireless signal processing. Multimedia algorithms are more complex featuring more control flow and variable computational requirements where execution time is not dominated by innermost vector loops. Further, data access is more complex where media applications typically operate on multi-dimensional vectors of data rather than single-dimensional vectors with simple strides. Thus, the design of current mobile platforms requires re-examination to account for these new application domains. In this work, we focus on the design of a programmable, low-power accelerator for multimedia algorithms referred to as a Polymorphic Pipeline Array, or PPA. The PPA is designed with flexibility and programmability as first-order requirements to enable the hardware to be dynamically customizable to the application. PPAs exploit pipeline parallelism found in streaming applications to create a coarse-grain hardware pipeline to execute streaming media applications. PPA resources are allocated to each stage depending on its size and ability to exploit fine-grain parallelism. Experimental results show that real-time media applications can take advantage of the static and dynamic configurability for increased power efficiency.