Full Length Article: A high performance, area efficient TTA-like vertex shader architecture with optimized floating point arithmetic unit for embedded graphics applications

  • Authors:
  • Yisong Chang;Jizeng Wei;Wei Guo;Jizhou Sun

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Microprocessors & Microsystems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A fully programmable vertex shader based on Transport Triggered Architecture (TTA) is proposed in this paper to provide high efficiency of performance and connectivity for embedded applications. At the architecture level, fine-grained data transport in TTA datapath and multi-threading method are adopted to exploit instruction and data level parallelism respectively in the graphics applications. The datapath connectivity can be optimized mainly by native architectural visible bypass in TTA and hybrid result re-collection schemes. At the shader core level, a novel SIMD multi-functional dot-production unit and an area efficient special function unit are introduced for floating-point processing. The proposed processor which achieves peak capacity of 1.5 GFLOPS and 125 Mvertices/s can totally acquire 17.6% reduction in hardware cost and can provide 1.3 times improvement in performance per logic cost ratio under a 0.18@mm CMOS process for real graphics benchmarks compared to previous expanded VLIW vertex processor.