An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing

Authors:
Jung-Wook Park;Hoon-Mo Yang;Gi-Ho Park;Shin-Dug Kim;Charles C. Weems
Affiliations:
Department of Computer Science, C532, Yonsei University, 134 Shinchon-dong Seoul, 120-749, Republic of Korea;Department of Computer Science, C532, Yonsei University, 134 Shinchon-dong Seoul, 120-749, Republic of Korea;Department of Computer Engineering, Sejong University, 98 Kunja-Dong, Kwangjin-Ku, Seoul, 143-747, Republic of Korea;Department of Computer Science, C532, Yonsei University, 134 Shinchon-dong Seoul, 120-749, Republic of Korea;Department of Computer Science, University of Massachusetts Amherst, MA 01003-4610, United States
Venue:
Journal of Parallel and Distributed Computing
Year:
2010

Citing 14
Cited 1

Interleaving: a multithreading technique targeting multiprocessors and workstations

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A user-programmable vertex engine

Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Overcoming the limitations of conventional vector processors

Proceedings of the 30th annual international symposium on Computer architecture
The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics

The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics
The Vector-Thread Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The GeForce 6800

IEEE Micro
Taking the Graphics Processor beyond Graphics

Computer
Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Why Systolic Architectures?

Computer
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro

Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to guarantee both performance and programmability demands in 3D graphics applications, vector and multithreaded SIMD architectures have been employed in recent graphics processing units. This paper introduces a novel instruction-systolic array architecture, which transfers an instruction stream in a pipelined fashion to efficiently share the expensive functional resources of a graphics processor. Specifically, cache misses and dynamic branches can cause additional latencies and complicated management in these parallel architectures. To address this problem, we combine a systolic execution scheme with on-demand warp activation that handles cache miss latency and branch divergence efficiently without significantly increasing hardware resources, either in terms of logic or register space. Simulation indicates that the proposed architecture offers 25% better performance than a traditional SIMD architecture with the same resources, and requires significantly fewer resources to match the performance of a typical modern vector multi-threaded GPU architecture.