Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor

Authors:
Claude Limousin;Julien Sebot;Alexis Vartanian;Nathalie Drach-Temam
Affiliations:
Laboratoire de Recherche en Informatique, Université Paris-Sud, F-91405 Orsay Cedex;Laboratoire de Recherche en Informatique, Université Paris-Sud, F-91405 Orsay Cedex;Laboratoire de Recherche en Informatique, Université Paris-Sud, F-91405 Orsay Cedex;Laboratoire de Recherche en Informatique, Université Paris-Sud, F-91405 Orsay Cedex
Venue:
ICS '01 Proceedings of the 15th international conference on Supercomputing
Year:
2001

Citing 20
Cited 5

An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The accuracy of trace-driven simulations of multiprocessors

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
InfiniteReality: a real-time graphics system

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
A shading language on graphics hardware: the pixelflow shading system

Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Exploiting instruction level parallelism in geometry processing for three dimensional graphics applications

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Evaluating MMX technology using DSP and multimedia applications

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology

ICS '99 Proceedings of the 13th international conference on Supercomputing
Interactive multi-pass programmable shading

Proceedings of the 27th annual conference on Computer graphics and interactive techniques
High-performance polygon rendering

SIGGRAPH '88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques
Designing and Programming the Emotion Engine

IEEE Micro
PopSPY: A PowerPC Instrumentation Tool for Multiprocessor Simulation

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Parallel Algorithm for 3D Geometry Transformations in OpenGL

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Performance Study of a Multithreaded Superscalar Microprocessor

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia Processors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Fine-Grain Multithreading Superscalar Architecture

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Exploiting thread-level parallelism on simultaneous multithreaded processors

Exploiting thread-level parallelism on simultaneous multithreaded processors
Simultaneous Multithreaded Vector Architecture: Merging ILP and DLP for High Performance

HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing

Predictable performance in SMT processors

Proceedings of the 1st conference on Computing frontiers
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Future ILP processors

International Journal of High Performance Computing and Networking
Optimising long-latency-load-aware fetch policies for SMT processors

International Journal of High Performance Computing and Networking
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we evaluate the performance of an SMT processor used as the geometry processor for a 3D polygonal rendering engine. To evaluate this approach, we consider PMesa (a parallel version of Mesa) which parallelizes the geometry stage of the 3D pipeline. We show that SMT is suitable for 3D geometry and we characterize the execution of the geometry stage in term of memory hierarchy, which is the main bottleneck. The results show that latency is not fully recovered by SMT; the use of L2 data prefetching does not succeed in increasing the performance. We show that this problem comes from a pollution of the instruction window by the threads experiencing L2 cache misses, thus reducing the window available for the other threads. We thus propose dcPRED, a hardware mechanism to predict L2 misses and control this pollution. Coupled with L2 data prefetching, dcPRED achieves gains up to 21% over the baseline SMT.