Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Cache and memory hierarchy design: a performance-directed approach
Cache and memory hierarchy design: a performance-directed approach
Computer graphics: principles and practice (2nd ed.)
Computer graphics: principles and practice (2nd ed.)
Operating system concepts (3rd ed.)
Operating system concepts (3rd ed.)
Advanced animation and rendering techniques
Advanced animation and rendering techniques
A low-cost usage-based replacement algorithm for cache memories
ACM SIGARCH Computer Architecture News
Fast shadows and lighting effects using texture mapping
SIGGRAPH '92 Proceedings of the 19th annual conference on Computer graphics and interactive techniques
SIGGRAPH '93 Proceedings of the 20th annual conference on Computer graphics and interactive techniques
FBRAM: a new form of memory optimized for 3D graphics
SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
Talisman: commodity realtime 3D graphics for the PC
SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
InfiniteReality: a real-time graphics system
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Hardware accelerated rendering of antialiasing using a modified a-buffer algorithm
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
The design and analysis of a cache architecture for texture mapping
Proceedings of the 24th annual international symposium on Computer architecture
An improved illumination model for shaded display
Communications of the ACM
The development of the MU5 computer system
Communications of the ACM - Special issue on computer architecture
Texture and reflection in computer generated images
Communications of the ACM
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
The Truth About Texture Mapping
IEEE Computer Graphics and Applications
SIGGRAPH '83 Proceedings of the 10th annual conference on Computer graphics and interactive techniques
Synthetic texturing using digital filters
SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Summed-area tables for texture mapping
SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
A subdivision algorithm for computer display of curved surfaces.
A subdivision algorithm for computer display of curved surfaces.
The aliasing problem in computer-synthesized shaded images.
The aliasing problem in computer-synthesized shaded images.
The design of a parallel graphics interface
Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Prefetching in a texture cache architecture
HWWS '98 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
HWWS '99 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Dynamic 3D graphics workload characterization and the architectural implications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Application-adaptive intelligent cache memory system
ACM Transactions on Embedded Computing Systems (TECS)
Design and Optimization of Large Size and Low Overhead Off-Chip Caches
IEEE Transactions on Computers
Power-Aware 3D Computer Graphics Rendering
Journal of VLSI Signal Processing Systems
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Energy-driven statistical sampling: detecting software hotspots
PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems
Dual-addressing memory architecture for two-dimensional memory access patterns
Proceedings of the Conference on Design, Automation and Test in Europe
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Traditional graphics hardware architectures implement what we call the push architecture for texture mapping. Local memory is dedicated to the accelerator for fast local retrieval of texture during rasterization, and the application is responsible for managing this memory. The push architecture has a bandwidth advantage, but disadvantages of limited texture capacity, escalation of accelerator memory requirements (and therefore cost), and poor memory utilization. The push architecture also requires the programmer to solve the bin- packing problem of managing accelerator memory each frame. More recently graphics hardware on PC-class machines has moved to an implementation of what we call the pull architecture. Texture is stored in system memory and downloaded by the accelerator as needed. The pull architecture has advantages of texture capacity, stems the escalation of accelerator memory requirements, and has good memory utilization. It also frees the programmer from accelerator texture memory management. However, the pull architecture suffers escalating requirements for bandwidth from main memory to the accelerator. In this paper we propose multi-level texture caching to provide the accelerator with the bandwidth advantages of the push architecture combined with the capacity advantages of the pull architecture. We have studied the feasibility of 2-level caching and found the following: (1) significant re-use of texture between frames; (2) L2 caching requires significantly less memory than the push architecture; (3) L2 caching requires significantly less bandwidth from host memory than the pull architecture; (4) L2 caching enables implementation of smaller L1 caches that would otherwise bandwidth-limit accelerators on the workloads in this paper. Results suggest that an L2 cache achieves the original advantage of the pull architecture --- stemming the growth of local texture memory --- while at the same time stemming the current explosion in demand for texture bandwidth between host memory and the accelerator.