A reconfigurable functional unit for TriMedia/CPU64. A case study

  • Authors:
  • Mihai Sima;Sorin Cotofana;Stamatis Vassiliadis;Jos T. J. van Eijndhoven;Kees Vissers

  • Affiliations:
  • Delft University of Technology, Department of Electrical Engineering, Mekelweg 4, 2628 CD Delft, The Netherlands and Philips Research Laboratories, Department of Information and Software Technolog ...;Delft University of Technology, Department of Electrical Engineering, Mekelweg 4, 2628 CD Delft, The Netherlands;Delft University of Technology, Department of Electrical Engineering, Mekelweg 4, 2628 CD Delft, The Netherlands;Philips Research Laboratories, Department of Information and Software Technology, Professor Holstlaan 4, 5656 AA Eindhoven, The Netherlands;TriMedia Technologies, Inc., 1840 McCarthy Boulevard, Milpitas, California

  • Venue:
  • Embedded processor design challenges
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents a case study on augmenting a TriMedia/CPU64 processor with a Reconfigurable (FPGA-based) Functional Unit (RFU). We first propose an extension of the TriMedia/CPU64 architecture, which consists of a RFU and its associated instructions. Then, we address the computation of the 8 × 8 IDCT on such extended TriMedia, and propose a scheme to implement an 8-point IDCT operation on the RFU. Further, we address the decoding of Variable Length Codes and describe the FPGA implementation of a Variable Length Decoder (VLD) computing facility. When mapped on an ACEX EP1K100 FPGA from Altera, our 8-point IDCT exhibits a latency of 16 and a recovery of 2 Tri-Media cycles, and occupies 42% of the FPGA's logic array blocks. The proposed VLD exhibits a latency of 7 TriMedia cycles when mapped on the same FPGA, and utilizes 6 of its embedded array blocks. By using the 8-point IDCT computing facility, an 8 × 8 IDCT including all overheads can be computed with the throughput of 1/32 IDCT/cycle. Also, with the proposed VLD computing facility, a single DCT coefficient can be decoded in 11 cycles including all overheads. Simulation results indicate that by configuring each of the 8-point IDCT and VLD computing facilities on a different FPGA context, and by activating the contexts as needed, the augmented TriMedia can perform MPEG macroblock parsing followed up by a pel reconstruction with an improvement of 20-25% over the standard TriMedia.