A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration

Authors:
Mame Maria Mbaye;Normand Bélanger;Yvon Savaria;Samuel Pierre
Affiliations:
Department of Electrical Engineering, École Polytechnique de Montréal, Montréal, Canada H3C 3A7;Department of Electrical Engineering, École Polytechnique de Montréal, Montréal, Canada H3C 3A7;Department of Electrical Engineering, École Polytechnique de Montréal, Montréal, Canada H3C 3A7;Department of Computer Engineering, École Polytechnique de Montréal, Montréal, Canada H3C 3A7
Venue:
Journal of VLSI Signal Processing Systems
Year:
2007

Citing 21
Cited 2

URPR—An extension of URCR for software pipelining

MICRO 19 Proceedings of the 19th annual workshop on Microprogramming
Optimal loop parallelization

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Two-dimensional signal and image processing

Two-dimensional signal and image processing
A new HW/SW partitioning algorithm for synthesizing the highest performance pipelined ASIPs with multiple identical FUs

EURO-DAC '96/EURO-VHDL '96 Proceedings of the conference on European design automation
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
An efficient technique for exploring register file size in ASIP synthesis

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-directed customization of ASIP cores

Proceedings of the tenth international symposium on Hardware/software codesign
Profiling tools for hardware/software partitioning of embedded applications

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Pipelined Fast 2-D DCT Architecture for JPEG Image Compression

Proceedings of the 14th symposium on Integrated circuits and systems design
Automatic generation of application specific processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Application Specific Instruction Set Processors: Redefining Hardware-Software Boundary

VLSID '04 Proceedings of the 17th International Conference on VLSI Design
Characterizing embedded applications for instruction-set extensible processors

Proceedings of the 41st annual Design Automation Conference
A Scalable Application-Specific Processor Synthesis Methodology

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Rapid Configuration and Instruction Selection for an ASIP: A Case Study

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Performance and Area Modeling of Complete FPGA Designs in the Presence of Loop Transformations

IEEE Transactions on Computers
Instruction set extension with shadow registers for configurable processors

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Fine-grained application source code profiling for ASIP design

Proceedings of the 42nd annual Design Automation Conference
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Motion adaptive interpolation with horizontal motion detection for deinterlacing

IEEE Transactions on Consumer Electronics
A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Exact and approximate algorithms for the extension of embedded processor instruction sets

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Two versions of architectures for dynamic implied addressing mode

Journal of Systems Architecture: the EUROMICRO Journal
Loop acceleration exploration for ASIP architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application-specific instruction-set processors (ASIPs) provide a good alternative for video processing acceleration, but the productivity gap implied by such a new technology may prevent leveraging it fully. Video processing SoCs need flexibility that is not available in pure hardware architectures, while pure software solutions do not meet video processing performance constraints. Thus, ASIP design could offer a good tradeoff between performance and flexibility. Video processing algorithms are often characterized by intrinsic parallelism that can be accelerated by ASIP specialized instructions. In this paper, we propose a new approach for exploiting sequences of tightly coupled specialized instructions in ASIP design applicable to video processing. Our approach, which avoids costly data communications by applying data grouping and data reuse, consists of accelerating an algorithm's critical loops by transforming them according to a new intermediate representation. This representation is optimized and loop parallelism possibilities are also explored. This approach has been applied to video processing algorithms such as the ELA deinterlacer and the 2D-DCT. Experimental results show speedups up to 18 (on the considered applications, while the hardware overhead in terms of additional logic gates was found to be between 18 and 59%.