Loop acceleration exploration for ASIP architecture

Authors:
Mame Maria Mbaye;Normand Bélanger;Yvon Savaria;Samuel Pierre
Affiliations:
Department of Electrical Engineering, Ecole Polytechnique de Montreal, Montreal, QC, Canada;Department of Electrical Engineering, Ecole Polytechnique de Montreal, Montreal, QC, Canada;Department of Electrical Engineering, Ecole Polytechnique de Montreal, Montreal, QC, Canada;Department of Electrical Engineering, Ecole Polytechnique de Montreal, Montreal, QC, Canada
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2012

Citing 36
Cited 0

Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining

ACM Computing Surveys (CSUR)
A new HW/SW partitioning algorithm for synthesizing the highest performance pipelined ASIPs with multiple identical FUs

EURO-DAC '96/EURO-VHDL '96 Proceedings of the conference on European design automation
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A compiler approach to fast hardware design space exploration in FPGA-based systems

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Optimized Unrolling of Nested Loops

International Journal of Parallel Programming
Loop Shifting for Loop Compaction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Memory Reuse Analysis in the Polyhedral Model

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Compiler-directed customization of ASIP cores

Proceedings of the tenth international symposium on Hardware/software codesign
Profiling tools for hardware/software partitioning of embedded applications

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Pipelined Fast 2-D DCT Architecture for JPEG Image Compression

Proceedings of the 14th symposium on Integrated circuits and systems design
Application-specific instruction generation for configurable processor architectures

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Automatic loop interchange

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Rapid Configuration and Instruction Selection for an ASIP: A Case Study

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Scalable custom instructions identification for instruction-set extensible processors

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Cache Conscious Data Layout Organization for Conflict Miss Reduction in Embedded Multimedia Applications

IEEE Transactions on Computers
Instruction set extension with shadow registers for configurable processors

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Fine-grained application source code profiling for ASIP design

Proceedings of the 42nd annual Design Automation Conference
Interprocedural parallelization analysis in SUIF

ACM Transactions on Programming Languages and Systems (TOPLAS)
An integer linear programming approach for identifying instruction-set extensions

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Loop Transformation Methodologies for Array-Oriented Memory Management

ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
Supporting multiple-input, multiple-output custom functions in configurable processors

Journal of Systems Architecture: the EUROMICRO Journal
Identifying potential parallelism via loop-centric profiling

Proceedings of the 4th international conference on Computing frontiers
A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration

Journal of VLSI Signal Processing Systems
The Instruction-Set Extension Problem: A Survey

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Loop-oriented metrics for exploring an application-specific architecture design-space

ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Rapid estimation of instruction cache hit rates using loop profiling

ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Fast custom instruction identification by convex subgraph enumeration

ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
A linear complexity algorithm for the generation of multiple input single output instructions of variable size

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Motion adaptive interpolation with horizontal motion detection for deinterlacing

IEEE Transactions on Consumer Electronics
Exact and approximate algorithms for the extension of embedded processor instruction sets

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Integrated Memory Controllers with Parallel Coherence Streams

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Design space exploration is a delicate process whose success lays on the designers' shoulders. It is often based on a trial-and-error approach. Some basic metrics can be used to guide this process. In this paper, we explore accelerating loops from C-based specifications. We built a framework in which a design style, such as software-oriented or application-specific instruction-set processor (ASIP)-oriented design, can be specified. We also propose an exploration process that allows targeting the main aspects that limit acceleration and the actions that can be made to improve it. The process is based on new loop-oriented metrics that provide insight in key design issues. They help to determine which aspects of the design between data accesses and arithmetic logic unit (ALU)/control operations limit or allow leveraging loop acceleration opportunities. We profile some benchmarks from the signal and image processing fields, such as the Turbo Decoder and the JPEG algorithms, to illustrate how loop-oriented metrics help to point out aspects that limit or improve loop acceleration. The loop acceleration process was also used to explore design architectures that can leverage, as much as possible, the loop acceleration opportunities of the sum of absolute differences (SAD) algorithm.