A Sorting Classification of Parallel Rendering
IEEE Computer Graphics and Applications
The sort-first rendering architecture for high-performance graphics
I3D '95 Proceedings of the 1995 symposium on Interactive 3D graphics
The design of a parallel graphics interface
Proceedings of the 25th annual conference on Computer graphics and interactive techniques
Tracking graphics state for networked rendering
HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Distributed rendering for scalable displays
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
WireGL: a scalable graphics system for clusters
Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Chromium: a stream-processing framework for interactive rendering on clusters
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
V-Pong: An Immersive Table Tennis Simulation
IEEE Computer Graphics and Applications
Real-Time Interaction with a Humanoid Avatar in an Immersive Table Tennis Simulation
IEEE Transactions on Visualization and Computer Graphics
Hi-index | 0.00 |
In this paper we present new functionality we added to the Chromium framework. When driving tiled displays using a sort-first configuration based on the Tilesort stream procession unit (SPU) the performance bottlenecks are the high utilization of the client host caused by the expensive sorting and bucketing of geometry and the high bandwidth consumption caused by a significant amount of redundant unicast transmissions. We addressed these problems with an implementation of a true point-to-multipoint connection type using UDP multicast. Based on this functionality we developed the so called OPT-SPU. This SPU replaces the widely used Tilesort-SPU in typical Sort-First environments. Tile-sorting and state differencing is not necessary because Multicasting allows us to send the geometry to all server nodes at once. Instead of tile-sorting a conventional frustum culling method is used to avoid needless server utilization caused by rendering of geometry outside their viewports. This approach leads to significant lower processor and memory load on the client and a very effective utilization of available network bandwidth. To avoid redundant transmissions of identical command sequences that are generated by the application several times we put a transparent stream cache into the multicast communication channel. In addition, frustum and hardware accelerated occlusion culling methods may be used to eliminate unnecessary transfer of invisible geometry. Finally, a software based method for synchronization of buffer swap operations at all servers was implemented. In a nutshell, for the first time an appropriate combination of our optimizations makes it possible to render large scenes synchronously on an arbitary number of tiles at nearly constant performance.