Discrete cosine transform: algorithms, advantages, applications
Discrete cosine transform: algorithms, advantages, applications
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
MPEG Video Compression Standard
MPEG Video Compression Standard
FPGA and CPLD Architectures: A Tutorial
IEEE Design & Test
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
ConCISe: A Compiler-Driven CPLD-Based Instruction Set Accelerator
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
TriMedia CPU64 Application Development Environment
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
An 8x8 IDCT Implementation on an FPGA-Augmented TriMedia
FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
High speed pattern matching for a fast Huffman decoder
IEEE Transactions on Consumer Electronics
Hi-index | 0.00 |
The paper presents a case study on augmenting a TriMedia/CPU64 processor with a Reconfigurable (FPGA-based) Functional Unit (RFU). We first propose an extension of the TriMedia/CPU64 architecture, which consists of a RFU and its associated instructions. Then, we address the computation of the 8 × 8 IDCT on such extended TriMedia, and propose a scheme to implement an 8-point IDCT operation on the RFU. Further, we address the decoding of Variable Length Codes and describe the FPGA implementation of a Variable Length Decoder (VLD) computing facility. When mapped on an ACEX EP1K100 FPGA from Altera, our 8-point IDCT exhibits a latency of 16 and a recovery of 2 Tri-Media cycles, and occupies 42% of the FPGA's logic array blocks. The proposed VLD exhibits a latency of 7 TriMedia cycles when mapped on the same FPGA, and utilizes 6 of its embedded array blocks. By using the 8-point IDCT computing facility, an 8 × 8 IDCT including all overheads can be computed with the throughput of 1/32 IDCT/cycle. Also, with the proposed VLD computing facility, a single DCT coefficient can be decoded in 11 cycles including all overheads. Simulation results indicate that by configuring each of the 8-point IDCT and VLD computing facilities on a different FPGA context, and by activating the contexts as needed, the augmented TriMedia can perform MPEG macroblock parsing followed up by a pel reconstruction with an improvement of 20-25% over the standard TriMedia.