Novel architecture for loop acceleration: a case study

Authors:
Seng Lin Shee;Sri Parameswaran;Newton Cheung
Affiliations:
University of New South Wales, Sydney, Australia;University of New South Wales, Sydney, Australia;University of New South Wales, Sydney, Australia
Venue:
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Year:
2005

Citing 19
Cited 6

A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A tool for partitioning and pipelined scheduling of hardware-software systems

Proceedings of the 11th international symposium on System synthesis
Profiling in the ASP codesign environment

Journal of Systems Architecture: the EUROMICRO Journal
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Hardware-Software Cosynthesis for Microcontrollers

IEEE Design & Test
Dynamic hardware/software partitioning: a first approach

Proceedings of the 40th annual Design Automation Conference
Control Speculation in Multithreaded Processors through Dynamic Loop Detection

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
A Processor-Coprocessor Architecture for High End Video Applications

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
PEAS-III: An ASIP Design Environment

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Characteristics of Loop Unrolling Effect: Software Pipelining and Memory Latency Hiding

IWIA '01 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01)
The chimaera reconfigurable functional unit

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
INSIDE: INstruction Selection/Identification & Design Exploration for Extensible Processors

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Rapid Embedded Hardware/Software System Generation

VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
An efficient architecture for JPEG2000 coprocessor

IEEE Transactions on Consumer Electronics
Specification and analysis of timing constraints for embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Custom-instruction synthesis for extensible-processor platforms

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Improving performance and energy consumption in embedded microprocessor platforms with a flexible custom coprocessor data-path

Proceedings of the 17th ACM Great Lakes symposium on VLSI
Exploring the speedups of embedded microprocessor systems utilizing a high-performance coprocessor data-path

The Journal of Supercomputing
Speedups in embedded systems with a high-performance coprocessor datapath

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Performance and energy consumption improvements in microprocessor systems utilizing a coprocessor data-path

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
The input-aware dynamic adaptation of area and performance for reconfigurable accelerator

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Rapid, low-power loop execution in a network of functional units

Proceedings of the 17th Panhellenic Conference on Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we show a novel approach to accelerate loops by tightly coupling a coprocessor to an ASIP. Latency hiding is used to exploit the parallelism available in this architecture. To illustrate the advantages of this approach, we investigate a JPEG encoding algorithm and accelerate one of its loop by implementing it in a coprocessor. We contrast the acceleration by implementing the critical segment as two different coprocessors and a set of customized instructions. The two different coprocessor approaches are: a high-level synthesis (HLS) approach; and a custom coprocessor approach. The HLS approach provides a faster method of generating coprocessors. We show that a loop performance improvement of 2.57x is achieved using the custom coprocessor approach, compared to 1.58x for the HLS approach and 1.33x for the customized instruction approach compared with just the main processor. Respective energy savings within the loop are 57%, 28% and 19%.