Parallel frame rendering: trading responsiveness for energy on a mobile GPU

Authors:
Jose-Maria Arnau;Joan-Manuel Parcerisa;Polychronis Xekalakis
Affiliations:
Universitat Politecnica de Catalunya, Barcelona, Spain;Universitat Politecnica de Catalunya, Barcelona, Spain;Intel Corporation, Santa Clara, USA
Venue:
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Year:
2013

Citing 15
Cited 0

The design and analysis of a cache architecture for texture mapping

Proceedings of the 24th annual international symposium on Computer architecture
A real-time low-latency hardware light-field renderer

Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Graphics for the masses: a hardware rasterization architecture for mobile phones

ACM SIGGRAPH 2003 Papers
iPACKMAN: high-quality, low-complexity texture compression for mobile phones

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Power analysis of mobile 3D graphics

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Exact and error-bounded approximate color buffer compression and decompression

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Efficient depth buffer compression

GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An analysis of power consumption in a smartphone

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
An Energy-Efficient Unified Register File for Mobile GPUs

EUC '11 Proceedings of the 2011 IFIP 9th International Conference on Embedded and Ubiquitous Computing
A single (unified) shader GPU microarchitecture for embedded systems

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Boosting mobile GPU performance with a decoupled access/execute fragment processor

Proceedings of the 39th Annual International Symposium on Computer Architecture
Adaptive scalable texture compression

EGGH-HPG'12 Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics
An efficient multi-view rasterization architecture

EGSR'06 Proceedings of the 17th Eurographics conference on Rendering Techniques
TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Perhaps one of the most important design aspects for smartphones and tablets is improving their energy efficiency. Unfortunately, rich media content applications typically put significant pressure to the GPU's memory subsystem. In this paper we propose a novel means of dramatically improving the energy efficiency of these devices, for this popular type of applications. The main hurdle in doing so is that GPUs require a significant amount of memory bandwidth in order to fetch all the necessary textures from memory. Although consecutive frames tend to operate on the same textures, their re-use distances are so big that to the caches fetching textures appears to be a streaming operation. Traditional designs improve the degree of multi-threading and the memory bandwidth, as a means of improving performance. In order to meet the energy efficiency standards required by the mobile market, we need a different approach. We thus propose a technique which we term Parallel Frame Rendering (PFR). Under PFR, we split the GPU into two clusters where two consecutive frames are rendered in parallel. PFR exploits the high degree of similarity between consecutive frames to save memory bandwidth by improving texture locality. Since the physics part of the rendering has to be computed sequentially for two consecutive frames, this naturally leads to an increase in the input delay latency for PFR compared with traditional systems. However we argue that this is rarely an issue, as the user interface in these devices is much slower than those of desktop systems. Moreover, we show that we can design reactive forms of PFR that allow us to bound the lag observed by the end user, thus maintaining the highest user experience when necessary. Overall we show that PFR can achieve 28% of memory bandwidth savings with only minimal loss in system responsiveness.