FlexCore: Utilizing Exposed Datapath Control for Efficient Computing

Authors:
Martin Thuresson;Magnus Själander;Magnus Björk;Lars Svensson;Per Larsson-Edefors;Per Stenstrom
Affiliations:
Chalmers University of Technology, Gothenburg, Sweden;Chalmers University of Technology, Gothenburg, Sweden;Chalmers University of Technology, Gothenburg, Sweden;Chalmers University of Technology, Gothenburg, Sweden;Chalmers University of Technology, Gothenburg, Sweden;Chalmers University of Technology, Gothenburg, Sweden
Venue:
Journal of Signal Processing Systems
Year:
2009

Citing 15
Cited 4

Computer organization and design (2nd ed.): the hardware/software interface

Computer organization and design (2nd ed.): the hardware/software interface
TTAs: missing the ILP complexity wall

Journal of Systems Architecture: the EUROMICRO Journal - Special double issue on microprocessor architecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP

ACM Transactions on Architecture and Code Optimization (TACO)
FITS: framework-based instruction-set tuning synthesis for embedded application specific processors

Proceedings of the 41st annual Design Automation Conference
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
The Impact of Performance Asymmetry in Emerging Multicore Architectures

Proceedings of the 32nd annual international symposium on Computer Architecture
A cycle-accurate compilation algorithm for custom pipelined datapaths

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Utilizing Horizontal and Vertical Parallelism with a No-Instruction-Set Compiler for Custom Datapaths

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Designing a custom architecture for DCT using NISC technology

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
High-quality ISA synthesis for low-power cache designs in embedded microprocessors

IBM Journal of Research and Development
A Flexible Datapath Interconnect for Embedded Applications

ISVLSI '07 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
PowerFITS: Reduce Dynamic and Static I-Cache Power Using Application Specific Instruction Set Synthesis

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

A Flexible Code Compression Scheme Using Partitioned Look-Up Tables

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Squashing microcode stores to size in embedded systems while delivering rapid microcode accesses

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A high-speed, energy-efficient two-cycle multiply-accumulate (MAC) architecture and Its application to a double-throughput MAC unit

IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2009 IEEE system-on-chip conference
Improving processor efficiency by statically pipelining instructions

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce FlexCore, the first exemplar of an architecture based on the FlexSoC framework. Comprising the same datapath units found in a conventional five-stage pipeline, the FlexCore has an exposed datapath control and a flexible interconnect to allow the datapath to be dynamically reconfigured as a consequence of code generation. Additionally, the FlexCore allows specialized datapath units to be inserted and utilized within the same architecture and compilation framework. This study shows that, in comparison to a conventional five-stage general-purpose processor, the FlexCore is up to 40% more efficient in terms of cycle count on a set of benchmarks from the embedded application domain. We show that both the fine-grained control and the flexible interconnect contribute to the speedup. Furthermore, according to our VLSI implementation study, the FlexCore architecture offers both time and energy savings. The exposed FlexCore datapath requires a wide control word. The conducted evaluation confirms that this increases the instruction bandwidth and memory footprint. This calls for efficient instruction decoding as proposed in the FlexSoC framework.