A run-time word-level reconfigurable coarse-grain functional unit for a VLIW processor
Proceedings of the 15th international symposium on System Synthesis
A Reconfigurable Functional Unit for TriMedia/CPU64. A Case Study
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Field-Programmable Custom Computing Machines - A Taxonomy -
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A reconfigurable functional unit for TriMedia/CPU64. A case study
Embedded processor design challenges
A scalable wide-issue clustered VLIW with a reconfigurable interconnect
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Pel reconstruction on FPGA-augmented TriMedia
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An FPGA-based VLIW processor with custom hardware execution
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
IEEE-Compliant IDCT on FPGA-Augmented TriMedia
Journal of VLSI Signal Processing Systems
The CSI multimedia architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE-compliant IDCT on FPGA-augmented TriMedia
Journal of VLSI Signal Processing Systems
Reducing power while increasing performance with supercisc
ACM Transactions on Embedded Computing Systems (TECS)
Power consumption and reduction in a real, commercial multimedia core
Proceedings of the 6th ACM conference on Computing frontiers
Hi-index | 0.02 |
This paper presents an experiment which aims to assess the potential impact on performance yielded by augmenting a TriMedia/CPU64 processor with a reconfigurable core. We first propose the skeleton of an extension of theTri-Media/CPU64 architecture, which consists of a Reconfigurable Functional Unit (RFU) and the associated instructions. Then, we address the computation of the 8脳8 IDCT on such extended TriMedia and propose a scheme to implement the 1-D IDCT operation on the RFU. When implemented on an ACEX EP1K100 FPGA from Altera, the proposed 1-D IDCT exhibits a latency of 16 and a recovery of 2 TriMedia (200 MHz) cycles, and occupies 42% of the device. By configuring the 1-D IDCT computing facility on the RFU at application load-time, a 2-D IDCT including all overheads can be computed with the throughput of 1/32 IDCT/cycle. This is an improvement of more than 40% over the standard TriMedia/CPU64.