Multi-level texture caching for 3D graphics hardware

Authors:
Michael Cox;Narendra Bhandari;Michael Shantz
Affiliations:
MRJ/NASA Ames Research Center, Moffett Field, CA and Intel Microcomputer Research Labs, 2200 Mission College Blvd., Santa Clara, CA;Intel Microcomputer Research Labs, 2200 Mission College Blvd., Santa Clara, CA;Intel Microcomputer Research Labs, 2200 Mission College Blvd., Santa Clara, CA
Venue:
Proceedings of the 25th annual international symposium on Computer architecture
Year:
1998

Citing 24
Cited 12

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
Computer graphics: principles and practice (2nd ed.)

Computer graphics: principles and practice (2nd ed.)
Operating system concepts (3rd ed.)

Operating system concepts (3rd ed.)
Advanced animation and rendering techniques

Advanced animation and rendering techniques
A low-cost usage-based replacement algorithm for cache memories

ACM SIGARCH Computer Architecture News
Fast shadows and lighting effects using texture mapping

SIGGRAPH '92 Proceedings of the 19th annual conference on Computer graphics and interactive techniques
Reality Engine graphics

SIGGRAPH '93 Proceedings of the 20th annual conference on Computer graphics and interactive techniques
FBRAM: a new form of memory optimized for 3D graphics

SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
Talisman: commodity realtime 3D graphics for the PC

SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
InfiniteReality: a real-time graphics system

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Hardware accelerated rendering of antialiasing using a modified a-buffer algorithm

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
The design and analysis of a cache architecture for texture mapping

Proceedings of the 24th annual international symposium on Computer architecture
An improved illumination model for shaded display

Communications of the ACM
The development of the MU5 computer system

Communications of the ACM - Special issue on computer architecture
Texture and reflection in computer generated images

Communications of the ACM
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
The Truth About Texture Mapping

IEEE Computer Graphics and Applications
Pyramidal parametrics

SIGGRAPH '83 Proceedings of the 10th annual conference on Computer graphics and interactive techniques
Synthetic texturing using digital filters

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Summed-area tables for texture mapping

SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
Compositing digital images

SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
A subdivision algorithm for computer display of curved surfaces.

A subdivision algorithm for computer display of curved surfaces.
The aliasing problem in computer-synthesized shaded images.

The aliasing problem in computer-synthesized shaded images.

The design of a parallel graphics interface

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Prefetching in a texture cache architecture

HWWS '98 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Parallel texture caching

HWWS '99 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Dynamic 3D graphics workload characterization and the architectural implications

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cache performance for multimedia applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Application-adaptive intelligent cache memory system

ACM Transactions on Embedded Computing Systems (TECS)
Design and Optimization of Large Size and Low Overhead Off-Chip Caches

IEEE Transactions on Computers
Power-Aware 3D Computer Graphics Rendering

Journal of VLSI Signal Processing Systems
Texture filter memory: a power-efficient and scalable texture memory architecture for mobile graphics processors

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Energy-driven statistical sampling: detecting software hotspots

PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems
Dual-addressing memory architecture for two-dimensional memory access patterns

Proceedings of the Conference on Design, Automation and Test in Europe
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional graphics hardware architectures implement what we call the push architecture for texture mapping. Local memory is dedicated to the accelerator for fast local retrieval of texture during rasterization, and the application is responsible for managing this memory. The push architecture has a bandwidth advantage, but disadvantages of limited texture capacity, escalation of accelerator memory requirements (and therefore cost), and poor memory utilization. The push architecture also requires the programmer to solve the bin- packing problem of managing accelerator memory each frame. More recently graphics hardware on PC-class machines has moved to an implementation of what we call the pull architecture. Texture is stored in system memory and downloaded by the accelerator as needed. The pull architecture has advantages of texture capacity, stems the escalation of accelerator memory requirements, and has good memory utilization. It also frees the programmer from accelerator texture memory management. However, the pull architecture suffers escalating requirements for bandwidth from main memory to the accelerator. In this paper we propose multi-level texture caching to provide the accelerator with the bandwidth advantages of the push architecture combined with the capacity advantages of the pull architecture. We have studied the feasibility of 2-level caching and found the following: (1) significant re-use of texture between frames; (2) L2 caching requires significantly less memory than the push architecture; (3) L2 caching requires significantly less bandwidth from host memory than the pull architecture; (4) L2 caching enables implementation of smaller L1 caches that would otherwise bandwidth-limit accelerators on the workloads in this paper. Results suggest that an L2 cache achieves the original advantage of the pull architecture --- stemming the growth of local texture memory --- while at the same time stemming the current explosion in demand for texture bandwidth between host memory and the accelerator.