The design and analysis of a cache architecture for texture mapping
Proceedings of the 24th annual international symposium on Computer architecture
A real-time low-latency hardware light-field renderer
Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Graphics for the masses: a hardware rasterization architecture for mobile phones
ACM SIGGRAPH 2003 Papers
iPACKMAN: high-quality, low-complexity texture compression for mobile phones
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Power analysis of mobile 3D graphics
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Exact and error-bounded approximate color buffer compression and decompression
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Efficient depth buffer compression
GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An analysis of power consumption in a smartphone
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
An Energy-Efficient Unified Register File for Mobile GPUs
EUC '11 Proceedings of the 2011 IFIP 9th International Conference on Embedded and Ubiquitous Computing
A single (unified) shader GPU microarchitecture for embedded systems
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Boosting mobile GPU performance with a decoupled access/execute fragment processor
Proceedings of the 39th Annual International Symposium on Computer Architecture
Adaptive scalable texture compression
EGGH-HPG'12 Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics
An efficient multi-view rasterization architecture
EGSR'06 Proceedings of the 17th Eurographics conference on Rendering Techniques
TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Perhaps one of the most important design aspects for smartphones and tablets is improving their energy efficiency. Unfortunately, rich media content applications typically put significant pressure to the GPU's memory subsystem. In this paper we propose a novel means of dramatically improving the energy efficiency of these devices, for this popular type of applications. The main hurdle in doing so is that GPUs require a significant amount of memory bandwidth in order to fetch all the necessary textures from memory. Although consecutive frames tend to operate on the same textures, their re-use distances are so big that to the caches fetching textures appears to be a streaming operation. Traditional designs improve the degree of multi-threading and the memory bandwidth, as a means of improving performance. In order to meet the energy efficiency standards required by the mobile market, we need a different approach. We thus propose a technique which we term Parallel Frame Rendering (PFR). Under PFR, we split the GPU into two clusters where two consecutive frames are rendered in parallel. PFR exploits the high degree of similarity between consecutive frames to save memory bandwidth by improving texture locality. Since the physics part of the rendering has to be computed sequentially for two consecutive frames, this naturally leads to an increase in the input delay latency for PFR compared with traditional systems. However we argue that this is rarely an issue, as the user interface in these devices is much slower than those of desktop systems. Moreover, we show that we can design reactive forms of PFR that allow us to bound the lag observed by the end user, thus maintaining the highest user experience when necessary. Overall we show that PFR can achieve 28% of memory bandwidth savings with only minimal loss in system responsiveness.