Shader Performance Analysis on a Modern GPU Architecture

Authors:
Victor Moya;Carlos Gonzalez;Jordi Roca;Agustin Fernandez;Roger Espasa
Affiliations:
Universitat Politècnica de Catalunya;Universitat Politècnica de Catalunya;Universitat Politècnica de Catalunya;Universitat Politècnica de Catalunya;Universitat Politècnica de Catalunya
Venue:
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2005

Citing 15
Cited 10

Hierarchical Z-buffer visibility

SIGGRAPH '93 Proceedings of the 20th annual conference on Computer graphics and interactive techniques
Triangle scan conversion using 2D homogeneous coordinates

HWWS '97 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
The design and analysis of a cache architecture for texture mapping

Proceedings of the 24th annual international symposium on Computer architecture
Prefetching in a texture cache architecture

HWWS '98 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
A user-programmable vertex engine

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Incremental and hierarchical Hilbert order edge equation polygon rasterizatione

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Ray tracing on programmable graphics hardware

Proceedings of the 29th annual conference on Computer graphics and interactive techniques
SaarCOR: a hardware architecture for ray tracing

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Comparing Reyes and OpenGL on a stream architecture

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
The Imagine Stream Processor

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Delay streams for graphics hardware

ACM SIGGRAPH 2003 Papers
Graphics for the masses: a hardware rasterization architecture for mobile phones

ACM SIGGRAPH 2003 Papers
A flexible simulation framework for graphics architectures

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Realtime ray tracing of dynamic scenes on an FPGA chip

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware

ParallAX: an architecture for real-time physics

Proceedings of the 34th annual international symposium on Computer architecture
StoreGPU: exploiting graphics processing units to accelerate distributed storage systems

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Using reconfigurable logic to optimise GPU memory accesses

Proceedings of the conference on Design, automation and test in Europe
Programmable and Scalable Architecture for Graphics Processing Units

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
On GPU's viability as a middleware accelerator

Cluster Computing
A dual-shader 3-D graphics processor with fast 4-D vector inner product units and power-aware texture cache

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Utilizing RF-I and intelligent scheduling for better throughput/watt in a mobile GPU memory system

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A single (unified) shader GPU microarchitecture for embedded systems

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
A systematic design space exploration approach to customising multi-processor architectures: exemplified using graphics processors

Transactions on High-Performance Embedded Architectures and Compilers IV
Homogeneous stream processors with embedded special function units for high-utilization programmable shaders

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an analysis of the performance of the shader processing units in a modern Graphics Processor Unit (GPU) architecture using real graphic applications. The architecture of a modern GPU is described and a simulator and associated framework used to evaluate the architecture is introduced. The paper analyses the effects in performance of different configurations of the shader processing units and compares a classic GPU with a unified shader GPU. The evaluated unified shader architecture proves to be 15% to 30% more efficient, in terms of area, with a 2% to 7% improvement in performance when compared with a similar non-unified architecture.