Theoretical modeling of superscalar processor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic Acceleration Structures for Interactive Ray Tracing
Proceedings of the Eurographics Workshop on Rendering Techniques 2000
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Efficiency issues for ray tracing
Journal of Graphics Tools
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
An Instruction Throughput Model of Superscalar Processors
RSP '03 Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP'03)
PAPI Deployment, Evaluation, and Extensions
DOD_UGC '03 Proceedings of the 2003 DoD User Group Conference
Universal Mechanisms for Data-Parallel Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Physically Based Rendering: From Theory to Implementation
Physically Based Rendering: From Theory to Implementation
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
IBM Journal of Research and Development
RPU: a programmable ray processing unit for realtime ray tracing
ACM SIGGRAPH 2005 Papers
Distributed Interactive Ray Tracing of Dynamic Scenes
PVG '03 Proceedings of the 2003 IEEE Symposium on Parallel and Large-Data Visualization and Graphics
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Characterization of simultaneous multithreading (SMT) efficiency in POWER5
IBM Journal of Research and Development - POWER5 and packaging
Accurate and efficient regression modeling for microarchitectural performance and power prediction
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Interactive k-d tree GPU raytracing
Proceedings of the 2007 symposium on Interactive 3D graphics and games
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
A 64-bit stream processor architecture for scientific applications
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 34th annual international symposium on Computer architecture
ParallAX: an architecture for real-time physics
Proceedings of the 34th annual international symposium on Computer architecture
Automated design of application specific superscalar processors: an analytical approach
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
B-KD trees for hardware accelerated ray tracing of dynamic scenes
GH '06 Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
TRaX: A Multi-Threaded Architecture for Real-Time Ray Tracing
SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
Razor: An architecture for dynamic multiresolution ray tracing
ACM Transactions on Graphics (TOG)
Integrated analysis of power and performance for pipelined microprocessors
IEEE Transactions on Computers
TRaX: a multicore hardware architecture for real-time ray tracing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Relax: an architectural framework for software recovery of hardware faults
Proceedings of the 37th annual international symposium on Computer architecture
MEDICS: ultra-portable processing for medical image reconstruction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Get the parallelism out of my cloud
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Parallel SAH k-D tree construction
Proceedings of the Conference on High Performance Graphics
Razor: An architecture for dynamic multiresolution ray tracing
ACM Transactions on Graphics (TOG)
T&I engine: traversal and intersection engine for hardware accelerated ray tracing
Proceedings of the 2011 SIGGRAPH Asia Conference
Massively parallel identification of intersection points for GPGPU ray tracing
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
An efficient parallel architecture for ray-tracing
Analog Integrated Circuits and Signal Processing
Journal of Systems Architecture: the EUROMICRO Journal
An energy and bandwidth efficient ray tracing architecture
Proceedings of the 5th High-Performance Graphics Conference
Parallel processing of intersections for ray-tracing in application-specific processors and GPGPUs
Microprocessors & Microsystems
Hi-index | 0.00 |
Significant improvement to visual quality for real-time 3D graphics requires modeling of complex illumination effects like soft-shadows, reflections, and diffuse lighting interactions. The conventional Z-buffer algorithm driven GPU model does not provide sufficient support for this improvement. This paper targets the entire graphics system stack and demonstrates algorithms, a software architecture, and a hardware architecture for real-time rendering with a paradigm shift to ray-tracing. The three unique features of our system called Copernicus are support for dynamic scenes, high image quality, and execution on programmable multicore architectures. The focus of this paper is the synergy and interaction between applications, architecture, and evaluation. First, we describe the ray-tracing algorithms which are designed to use redundancy and partitioning to achieve locality. Second, we describe the architecture which uses ISA specialization, multi-threading to hide memory delays and supports only local coherence. Finally, we develop an analytical performance model for our 128-core system, using measurements from simulation and a scaled-down prototype system. More generally, this paper addresses an important issue of mechanisms and evaluation for challenging workloads for future processors. Our results show that a single 8-core tile (each core 4-way multithreaded) can be almost 100% utilized and sustain 10 million rays/second. Sixteen such tiles, which can fit on a 240mm2 chip in 22nm technology, make up the system and with our anticipated improvements in algorithms, can sustain real-time rendering. The mechanisms and the architecture can potentially support other domains like irregular scientific computations and physics computations.