Automatic architecture refinement techniques for customizing processing elements

Authors:
Bita Gorjiara;Daniel Gajski
Affiliations:
University of California, Irvine;University of California, Irvine
Venue:
Proceedings of the 45th annual Design Automation Conference
Year:
2008

Citing 13
Cited 9

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
High-level synthesis: introduction to chip and system design

High-level synthesis: introduction to chip and system design
An iterative improvement algorithm for low power data path synthesis

ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
LISA—machine description language for cycle-accurate models of programmable DSP architectures

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A method to derive application-specific embedded processing cores

CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Generating Reliable Embedded Processors

IEEE Micro
Xtensa: A Configurable and Extensible Processor

IEEE Micro
A cycle-accurate compilation algorithm for custom pipelined datapaths

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Applying Resource Sharing Algorithms to ADL-driven Automatic ASIP Implementation

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
FPGA-friendly code compression for horizontal microcoded custom IPs

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
C-based design flow: a case study on G.729A for voice over internet protocol (VoIP)

Proceedings of the 45th annual Design Automation Conference
Function Call Optimization for Efficient Behavioral Synthesis

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

Merged Dictionary Code Compression for FPGA Implementation of Custom Microcoded PEs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
C-based design flow: a case study on G.729A for voice over internet protocol (VoIP)

Proceedings of the 45th annual Design Automation Conference
Logic synthesis and circuit customization using extensive external don't-cares

ACM Transactions on Design Automation of Electronic Systems (TODAES)
VariPipe: low-overhead variable-clock synchronous pipelines

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A complete design-flow for the generation of ultra low-power WSN node architectures based on micro-tasking

Proceedings of the 47th Design Automation Conference
Customizing IP cores for system-on-chip designs using extensive external don't-cares

Proceedings of the Conference on Design, Automation and Test in Europe
Word-Length Aware DSP Hardware Design Flow Based on High-Level Synthesis

Journal of Signal Processing Systems
High performance and area efficient flexible DSP datapath synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
System-Level Synthesis for Wireless Sensor Node Controllers: A Complete Design Flow

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose an approach for designing high-performance energy-efficient processing elements (PEs) using statically-scheduled nanocode-based architectures. Our approach is based on bottom-up refinement/trimming techniques that optimize a given datapath irrespective of whether it was designed manually or generated automatically. The optimizations can also preserve parts of the netlist specified by the designers, and hence, allow reuse of design efforts and can lead to predictable convergence. In this paper, we show that trimming unused and underutilized resources of typical general-purpose datapaths can lead to 30-40% average energy savings, without any performance loss. However, general-purpose architectures often compromise parallelism to make the design implementable. With our trimming approach, we can afford to have a base architecture that is not intended for implementation and has more parallelism, and then apply refinement to make it implementable. For our benchmarks, we achieved up to 1.8 times (avg. 25%) and 2.6 times (avg. 40%) performance improvement, compared to two general-purpose architectures (i.e. a 4-issue VLIW and a DLX), respectively. Additionally, the energy consumption is reduced by up to 5 times (avg. 2 times) compared to the trimmed general-purpose architectures.