Fibonacci heaps and their uses in improved network optimization algorithms
Journal of the ACM (JACM)
Data path allocation based on bipartite weighted matching
DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
High-level synthesis: introduction to chip and system design
High-level synthesis: introduction to chip and system design
False loops through resource sharing
ICCAD '92 1992 IEEE/ACM international conference proceedings on Computer-aided design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Register allocation and binding for low power
DAC '95 Proceedings of the 32nd annual ACM/IEEE Design Automation Conference
A new approach to the multiport memory allocation problem in data path synthesis
Integration, the VLSI Journal
A scheduling algorithm for multiport memory minimization in datapath synthesis
ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
Low energy memory and register allocation using network flow
DAC '97 Proceedings of the 34th annual Design Automation Conference
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Low-energy embedded FPGA structures
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
High-level synthesis under multi-cycle interconnect delay
Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Forward-looking objective functions: concept & applications in high level synthesis
Proceedings of the 39th annual Design Automation Conference
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Behavior-to-placed RTL synthesis with performance-driven placement
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Efficient circuit clustering for area and power reduction in FPGAs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Fast Prototyping of Datapath-Intensive Architectures
IEEE Design & Test
Imagine: Media Processing with Streams
IEEE Micro
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
VLSI Architecture: Past, Present, and Future
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Low-power high-level synthesis for FPGA architectures
Proceedings of the 2003 international symposium on Low power electronics and design
Interface Synthesis using Memory Mapping for an FPGA Platform
ICCD '03 Proceedings of the 21st International Conference on Computer Design
Register binding and port assignment for multiplexer optimization
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Fully distributed register files for heterogeneous clustered microarchitectures
Fully distributed register files for heterogeneous clustered microarchitectures
Platform-based resource binding using a distributed register-file microarchitecture
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Interconnect and communication synthesis for distributed register-file microarchitecture
Proceedings of the 44th annual Design Automation Conference
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Simultaneous FU and register binding based on network flow method
Proceedings of the conference on Design, automation and test in Europe
Architecture and synthesis for on-chip multicycle communication
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High-level synthesis with distributed controller for fast timing closure
Proceedings of the International Conference on Computer-Aided Design
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Critical-path-aware high-level synthesis with distributed controller for fast timing closure
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
Behavior synthesis and optimization beyond the register-transfer level require an efficient utilization of the underlying platform features. This article presents a platform-based resource binding approach based on a Distributed Register-File Microarchitecture (DRFM), which makes efficient use of distributed embedded memory blocks as register files in modern FPGAs. DRFM contains multiple islands, each having a local register file, a functional unit pool, and data-routing logic. Compared to the traditional discrete-register counterpart, a DRFM allows use of the platform-featured on-chip memory or register-file IP blocks to implement its local register files, and this results in a substantial saving of multiplexing logic and global interconnects. DRFM provides a useful architectural template and a direct optimization objective for minimizing interisland connections for synthesis algorithms. Given the scheduling solution and resource (functional units) constraints, two novel algorithms in the resource binding stage are developed based on DRFM: (i) a simultaneous DRFM clustering and binding algorithm, which decides the configuration of DRFM and the assignment of operations into islands with the focus on optimizing global connections; (ii) a data-forwarding scheduling algorithm, which takes advantage of the operation slacks to handle the read-port restriction of register files. On the Xilinx Virtex4 FPGA platform, experimental results with a set of real-life test cases show a 50% logic area reduction achieved by applying our approach, with a 14.6% performance improvement, compared to the traditional discrete-register-based approach. Also, experiments on small-size designs show that our algorithm produces the same number of total connections and at most one more maximum feeding-in connection compared to optimal solutions generated by ILP.