A wire delay-tolerant reconfigurable unit for a clustered programmable-reconfigurable processor

Authors:
Richard B. Kujoth;Chi-Wei Wang;Jeffrey J. Cook;Derek B. Gottlieb;Nicholas P. Carter
Affiliations:
Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 1308 West Main St. Urbana, IL 61801, USA;Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 1308 West Main St. Urbana, IL 61801, USA;Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 1308 West Main St. Urbana, IL 61801, USA;Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 1308 West Main St. Urbana, IL 61801, USA;Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 1308 West Main St. Urbana, IL 61801, USA
Venue:
Microprocessors & Microsystems
Year:
2007

Citing 19
Cited 0

The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor

Proceedings of the 25th annual international symposium on Computer architecture
HSRA: high-speed, hierarchical synchronous reconfigurable array

FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
High-performance carry chains for FPGA's

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Interconnect pipelining in a throughput-intensive FPGA architecture

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
The case for registered routing switches in field programmable gate arrays

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Garp Architecture and C Compiler

Computer
Architecture Design of Reconfigurable Pipelined Datapaths

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
NAPA C: Compiling for a Hybrid RISC/FPGA Architecture

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
The Imagine Stream Processor

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks

Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Exploration of pipelined FPGA interconnect structures

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
A reconfigurable unit for a clustered programmable-reconfigurable processor

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Wire delay is rapidly becoming a major bottleneck in reconfigurable systems, creating a significant gap between the clock rates of reconfigurable logic and custom circuits. In this paper, we describe the design of the reconfigurable clusters on the Amalgam clustered programmable-reconfigurable processor. Amalgam's reconfigurable clusters are divided into four segments of reconfigurable logic, limiting the length of individual wires in the cluster. They support pipelining of wire delays by providing pipeline registers at the intersections between wires in the reconfigurable cluster, retiming buffers at the inputs and outputs of logic blocks, and register queues that reduce the amount of inter-cluster synchronization required in programs. Together, these mechanisms increase the clock rates of Amalgam's reconfigurable clusters by up to 70%, allowing Amalgam to maintain a 2.6x performance advantage over a purely-programmable processor in a wide range of fabrication processes.