Full Length Article: A high performance, area efficient TTA-like vertex shader architecture with optimized floating point arithmetic unit for embedded graphics applications

Authors:
Yisong Chang;Jizeng Wei;Wei Guo;Jizhou Sun
Affiliations:
-;-;-;-
Venue:
Microprocessors & Microsystems
Year:
2013

Citing 15
Cited 0

TTAs: missing the ILP complexity wall

Journal of Systems Architecture: the EUROMICRO Journal - Special double issue on microprocessor architecture
Computation in the context of transport triggered architectures

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A user-programmable vertex engine

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
3D graphics LSI core for mobile phone "Z3D"

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
High-Speed Function Approximation Using a Minimax Quadratic Interpolator

IEEE Transactions on Computers
The GeForce 6800

IEEE Micro
The Direct3D 10 system

ACM SIGGRAPH 2006 Papers
Parallel Memory Architecture for Application-Specific Instruction-Set Processors

Journal of Signal Processing Systems
A Floating-Point Unit for 4D Vector Inner Product with Reduced Latency

IEEE Transactions on Computers
Programmable processor implementations of K-best list sphere detector for MIMO receiver

Signal Processing
Programmable and Scalable Architecture for Graphics Processing Units

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
A 186-Mvertices/s 161-mW floating-point vertex processor with optimized datapath and vertex caches

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Tuning a protocol processor architecture towards DSP operations

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Low-power, high-performance TTA processor for 1024-point fast fourier transform

SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Low-power 3D graphics processors for mobile terminals

IEEE Communications Magazine

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fully programmable vertex shader based on Transport Triggered Architecture (TTA) is proposed in this paper to provide high efficiency of performance and connectivity for embedded applications. At the architecture level, fine-grained data transport in TTA datapath and multi-threading method are adopted to exploit instruction and data level parallelism respectively in the graphics applications. The datapath connectivity can be optimized mainly by native architectural visible bypass in TTA and hybrid result re-collection schemes. At the shader core level, a novel SIMD multi-functional dot-production unit and an area efficient special function unit are introduced for floating-point processing. The proposed processor which achieves peak capacity of 1.5 GFLOPS and 125 Mvertices/s can totally acquire 17.6% reduction in hardware cost and can provide 1.3 times improvement in performance per logic cost ratio under a 0.18@mm CMOS process for real graphics benchmarks compared to previous expanded VLIW vertex processor.