Logical effort: designing fast CMOS circuits
Logical effort: designing fast CMOS circuits
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The design of a SRAM-based field-programmable gate array—part II: circuit design and layout
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
High-performance carry chains for FPGA's
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design Challenges of Technology Scaling
IEEE Micro
Architecture Design of Reconfigurable Pipelined Datapaths
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor
Microprocessors & Microsystems
An overview of reconfigurable hardware in embedded systems
EURASIP Journal on Embedded Systems
Design principles for a virtual multiprocessor
Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Hi-index | 0.00 |
In a clustered programmable-reconfigurable processor, multiple programmable processors and blocks of reconfigurable logic communicate through a register-based communication mechanism, which reduces the impact of wire delay on clock cycle time. In this paper, we present a circuit-level design for the reconfigurable clusters used on the Amalgam programmable-reconfigurable processor. We outline our interleaved reconfigurable array design, which provides high bandwidth to and from the register file without requiring large amounts of register control logic. We characterize the latency of operations in our array, and present results that show the impact that this latency has on overall system performance in a range of fabrication processes. Finally, we present a pipelining scheme that enables the array to operate at clock rates closer to those of programmable processors and allows for better scaling in future technologies.