The design and analysis of a cache architecture for texture mapping
Proceedings of the 24th annual international symposium on Computer architecture
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
Shader Performance Analysis on a Modern GPU Architecture
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Floating-Point Unit for 4D Vector Inner Product with Reduced Latency
IEEE Transactions on Computers
Fast configurable-cache tuning with a unified second-level cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
This paper presents a fully programmable 3-D graphics processor using unified shaders for mobile environment. In the system level, we adopted dual-core, dual-issue VLIW, and multithreading methods to utilize instruction, data, and task level parallelism in the graphics applications. In the shader core level, a novel IEEE-754 compliant 4-D vector inner product arithmetic unit and a configurable texture cache are proposed. Using these methods, the proposed processor achieves 143 Mvertices/s and 2.3 Gtexels/s consuming the power of 367 mW. The evaluation shows significant performance and power-delay product benefits. For real graphics applications, test results indicate 2.07 times improvement in performance and 34% reduction in power-delay product compared to previous mobile 3-D graphics processors. The proposed 3-D graphics processor is implemented in 4.5 × 4.5 2mm using 0.18-µm CMOS technology.