An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The accuracy of trace-driven simulations of multiprocessors
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
InfiniteReality: a real-time graphics system
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
A shading language on graphics hardware: the pixelflow shading system
Proceedings of the 25th annual conference on Computer graphics and interactive techniques
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Evaluating MMX technology using DSP and multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology
ICS '99 Proceedings of the 13th international conference on Supercomputing
Interactive multi-pass programmable shading
Proceedings of the 27th annual conference on Computer graphics and interactive techniques
High-performance polygon rendering
SIGGRAPH '88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques
Designing and Programming the Emotion Engine
IEEE Micro
PopSPY: A PowerPC Instrumentation Tool for Multiprocessor Simulation
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Parallel Algorithm for 3D Geometry Transformations in OpenGL
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Performance Study of a Multithreaded Superscalar Microprocessor
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia Processors
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Fine-Grain Multithreading Superscalar Architecture
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Exploiting thread-level parallelism on simultaneous multithreaded processors
Exploiting thread-level parallelism on simultaneous multithreaded processors
Simultaneous Multithreaded Vector Architecture: Merging ILP and DLP for High Performance
HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
Predictable performance in SMT processors
Proceedings of the 1st conference on Computing frontiers
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
International Journal of High Performance Computing and Networking
Optimising long-latency-load-aware fetch policies for SMT processors
International Journal of High Performance Computing and Networking
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
In this paper we evaluate the performance of an SMT processor used as the geometry processor for a 3D polygonal rendering engine. To evaluate this approach, we consider PMesa (a parallel version of Mesa) which parallelizes the geometry stage of the 3D pipeline. We show that SMT is suitable for 3D geometry and we characterize the execution of the geometry stage in term of memory hierarchy, which is the main bottleneck. The results show that latency is not fully recovered by SMT; the use of L2 data prefetching does not succeed in increasing the performance. We show that this problem comes from a pollution of the instruction window by the threads experiencing L2 cache misses, thus reducing the window available for the other threads. We thus propose dcPRED, a hardware mechanism to predict L2 misses and control this pollution. Coupled with L2 data prefetching, dcPRED achieves gains up to 21% over the baseline SMT.