Programmable and Scalable Architecture for Graphics Processing Units

Authors:
Carlos S. Lama;Pekka Jääskeläinen;Jarmo Takala
Affiliations:
Department of Computer Architecture, Computer Science and Artificial Intelligence, Universidad Rey Juan Carlos, Madrid, Spain 28933 Móstoles;Department of Computer Systems, Tampere University of Technology, Tampere, Finland 33720;Department of Computer Systems, Tampere University of Technology, Tampere, Finland 33720
Venue:
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Year:
2009

Citing 11
Cited 1

A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Register file port requirements of transport triggered architectures

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
TTAs: missing the ILP complexity wall

Journal of Systems Architecture: the EUROMICRO Journal - Special double issue on microprocessor architecture
Microprocessor Architectures: From VLIW to Tta

Microprocessor Architectures: From VLIW to Tta
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Shader Performance Analysis on a Modern GPU Architecture

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
The COMPLETE Effect and HLSL Guide

The COMPLETE Effect and HLSL Guide
How GPUs Work

Computer
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro

Full Length Article: A high performance, area efficient TTA-like vertex shader architecture with optimized floating point arithmetic unit for embedded graphics applications

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing systems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput. In this paper we evaluate the suitability of Transport Triggered Architectures (TTA) as a basis for implementing GPUs. TTA improves scalability over the traditional VLIW-style architectures making it interesting for computationally intensive applications. We show that TTA provides high floating point processing performance while allowing more programming freedom than vector processors. Finally, one of the main features of the presented TTA-based GPU design is its fully programmable architecture making it suitable target for general purpose computing on GPU APIs which have become popular in recent years.