The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor
Proceedings of the 25th annual international symposium on Computer architecture
HSRA: high-speed, hierarchical synchronous reconfigurable array
FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
High-performance carry chains for FPGA's
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Interconnect pipelining in a throughput-intensive FPGA architecture
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
The case for registered routing switches in field programmable gate arrays
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Garp Architecture and C Compiler
Computer
Architecture Design of Reconfigurable Pipelined Datapaths
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
NAPA C: Compiling for a Hybrid RISC/FPGA Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Exploration of pipelined FPGA interconnect structures
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
A reconfigurable unit for a clustered programmable-reconfigurable processor
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Hi-index | 0.00 |
Wire delay is rapidly becoming a major bottleneck in reconfigurable systems, creating a significant gap between the clock rates of reconfigurable logic and custom circuits. In this paper, we describe the design of the reconfigurable clusters on the Amalgam clustered programmable-reconfigurable processor. Amalgam's reconfigurable clusters are divided into four segments of reconfigurable logic, limiting the length of individual wires in the cluster. They support pipelining of wire delays by providing pipeline registers at the intersections between wires in the reconfigurable cluster, retiming buffers at the inputs and outputs of logic blocks, and register queues that reduce the amount of inter-cluster synchronization required in programs. Together, these mechanisms increase the clock rates of Amalgam's reconfigurable clusters by up to 70%, allowing Amalgam to maintain a 2.6x performance advantage over a purely-programmable processor in a wide range of fabrication processes.