Utilizing Horizontal and Vertical Parallelism with a No-Instruction-Set Compiler for Custom Datapaths

Authors:
Mehrdad Reshadi;Bita Gorjiara;Daniel Gajski
Affiliations:
Center for Embedded Computer Systems (CECS), University of California Irvine;Center for Embedded Computer Systems (CECS), University of California Irvine;Center for Embedded Computer Systems (CECS), University of California Irvine
Venue:
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Year:
2005

Citing 11
Cited 15

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
High-level synthesis: introduction to chip and system design

High-level synthesis: introduction to chip and system design
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Flow graph representation

DAC '86 Proceedings of the 23rd ACM/IEEE Design Automation Conference
Describing instruction set processors using nML

EDTC '95 Proceedings of the 1995 European conference on Design and Test
A Graph Based Processor Model for Retargetable Code Generation

EDTC '96 Proceedings of the 1996 European conference on Design and Test
Retargetable Generation of Code Selectors from HDL Processor Models

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Cone Based Clustering for List Scheduling Algorithms

EDTC '97 Proceedings of the 1997 European conference on Design and Test
An Efficient List-Based Scheduling Algorithm for High-Level Synthesis

DSD '02 Proceedings of the Euromicro Symposium on Digital Systems Design
The mimola design system: Tools for the design of digital processors

DAC '84 Proceedings of the 21st Design Automation Conference
ASIP Design Methodologies: Survey and Issues

VLSID '01 Proceedings of the The 14th International Conference on VLSI Design (VLSID '01)

Designing a custom architecture for DCT using NISC technology

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
FPGA-friendly code compression for horizontal microcoded custom IPs

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Interrupt and low-level programming support for expanding the application domain of statically-scheduled horizontal-microcoded architectures in embedded systems

Proceedings of the conference on Design, automation and test in Europe
VEBoC: variation and error-aware design for billions of devices on a chip

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Merged Dictionary Code Compression for FPGA Implementation of Custom Microcoded PEs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A Flexible Code Compression Scheme Using Partitioned Look-Up Tables

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
FlexCore: Utilizing Exposed Datapath Control for Efficient Computing

Journal of Signal Processing Systems
Squashing microcode stores to size in embedded systems while delivering rapid microcode accesses

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
High performance and area efficient flexible DSP datapath synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Enforcing architectural contracts in high-level synthesis

Proceedings of the 48th Design Automation Conference
Compiling high throughput network processors

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Automated generation of custom processor core from C code

Journal of Electrical and Computer Engineering - Special issue on ESL Design Methodology
Synthesis of networks of custom processing elements for real-time physical system emulation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Improving processor efficiency by statically pipelining instructions

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Automatic synthesis of physical system differential equation models to a custom network of general processing elements on FPGAs

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance of programs can be improved by utilizing their horizontal and vertical parallelism. In some processors (VLIW based), compiler can utilize horizontal parallelism by controlling the schedule of independent operations. Vertical parallelism is utilized through pipelining. However, in all processors, structure of pipeline is fixed and compiler has no control over it. In Application-Specific-Instruction set- Processors (ASIPs), pipeline structure can be customized and utilized in the program through custom instructions. Practical constraints on the instruction decoder limit the number and complexity of custom instructions in ASIPs. Detecting the frequent and beneficial custom instructions and incorporating them in the compiler are complex and sometimes very time consuming tasks. In this paper, we present an architecture that does not limit the number of custom functionalities that can be implemented on its datapath. Instead of using custom instructions and then relying on the decoder in hardware to generate the control signals, we generate the control signal values in compiler. Since there are no predefined instructions in this architecture, we call it No-Instruction-Set-Computer (NISC). The NISC compiler maps the application directly on the datapath. It has complete fine grain control over datapath and hence can very well utilize resources in the hardware as well as horizontal and vertical parallelism in the program. We also explain the algorithm for mapping the CDFG of a program on a given datapath in NISC. Using our algorithm and a NISC architecture with the datapath of a MIPS, we achieved up to 70% speedup over the traditional MIPS compiler. In another experiment, we started from a base architecture and customized it by adding resources and interconnects to increase its horizontal and vertical parallelism. The algorithm achieved up to 15.5 times speedup by utilizing the available parallelism in the program and the datapath.