Simultaneous resource binding and interconnection optimization based on a distributed register-file microarchitecture

Authors:
Jason Cong;Yiping Fan;Junjuan Xu
Affiliations:
University of California, Los Angeles, Los Angeles, CA;AutoESL, Inc., Cupertino, CA;University of California, Los Angeles, Los Angeles, CA
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2009

Citing 31
Cited 4

Fibonacci heaps and their uses in improved network optimization algorithms

Journal of the ACM (JACM)
Data path allocation based on bipartite weighted matching

DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
High-level synthesis: introduction to chip and system design

High-level synthesis: introduction to chip and system design
False loops through resource sharing

ICCAD '92 1992 IEEE/ACM international conference proceedings on Computer-aided design
Optimum and heuristic transformation techniques for simultaneous optimization of latency and throughput

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Register allocation and binding for low power

DAC '95 Proceedings of the 32nd annual ACM/IEEE Design Automation Conference
A new approach to the multiport memory allocation problem in data path synthesis

Integration, the VLSI Journal
A scheduling algorithm for multiport memory minimization in datapath synthesis

ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
Low energy memory and register allocation using network flow

DAC '97 Proceedings of the 34th annual Design Automation Conference
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Low-energy embedded FPGA structures

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
High-level synthesis under multi-cycle interconnect delay

Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Forward-looking objective functions: concept & applications in high level synthesis

Proceedings of the 39th annual Design Automation Conference
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Behavior-to-placed RTL synthesis with performance-driven placement

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Efficient circuit clustering for area and power reduction in FPGAs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Fast Prototyping of Datapath-Intensive Architectures

IEEE Design & Test
Imagine: Media Processing with Streams

IEEE Micro
Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
VLSI Architecture: Past, Present, and Future

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Low-power high-level synthesis for FPGA architectures

Proceedings of the 2003 international symposium on Low power electronics and design
Interface Synthesis using Memory Mapping for an FPGA Platform

ICCD '03 Proceedings of the 21st International Conference on Computer Design
Register binding and port assignment for multiplexer optimization

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Fully distributed register files for heterogeneous clustered microarchitectures

Fully distributed register files for heterogeneous clustered microarchitectures
Platform-based resource binding using a distributed register-file microarchitecture

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Interconnect and communication synthesis for distributed register-file microarchitecture

Proceedings of the 44th annual Design Automation Conference
Generation of heterogeneous distributed architectures for memory-intensive applications through high-level synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Simultaneous FU and register binding based on network flow method

Proceedings of the conference on Design, automation and test in Europe
Architecture and synthesis for on-chip multicycle communication

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Energy-aware interconnect resource reduction through buffer access manipulation for data-centric applications

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High-level synthesis with distributed controller for fast timing closure

Proceedings of the International Conference on Computer-Aided Design
Constraint Programming Approach to Reconfigurable Processor Extension Generation and Application Compilation

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Critical-path-aware high-level synthesis with distributed controller for fast timing closure

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Behavior synthesis and optimization beyond the register-transfer level require an efficient utilization of the underlying platform features. This article presents a platform-based resource binding approach based on a Distributed Register-File Microarchitecture (DRFM), which makes efficient use of distributed embedded memory blocks as register files in modern FPGAs. DRFM contains multiple islands, each having a local register file, a functional unit pool, and data-routing logic. Compared to the traditional discrete-register counterpart, a DRFM allows use of the platform-featured on-chip memory or register-file IP blocks to implement its local register files, and this results in a substantial saving of multiplexing logic and global interconnects. DRFM provides a useful architectural template and a direct optimization objective for minimizing interisland connections for synthesis algorithms. Given the scheduling solution and resource (functional units) constraints, two novel algorithms in the resource binding stage are developed based on DRFM: (i) a simultaneous DRFM clustering and binding algorithm, which decides the configuration of DRFM and the assignment of operations into islands with the focus on optimizing global connections; (ii) a data-forwarding scheduling algorithm, which takes advantage of the operation slacks to handle the read-port restriction of register files. On the Xilinx Virtex4 FPGA platform, experimental results with a set of real-life test cases show a 50% logic area reduction achieved by applying our approach, with a 14.6% performance improvement, compared to the traditional discrete-register-based approach. Also, experiments on small-size designs show that our algorithm produces the same number of total connections and at most one more maximum feeding-in connection compared to optimal solutions generated by ILP.