Binary Mesh Partitioning for Cache-Efficient Visualization

Authors:
Marc Tchiboukdjian;Vincent Danjean;Bruno Raffin
Affiliations:
CNRS and CEA/DAM, DIF, ENSIGMAG-Antenne de Montbonnot, Montbonnot Saint Martin;Grenoble Universités and LIG, ENSIGMAG-Antenne de Montbonnot, Montbonnot Saint Martin;INRIA and LIG, ENSIGMAG-Antenne de Montbonnot, Montbonnot Saint Martin
Venue:
IEEE Transactions on Visualization and Computer Graphics
Year:
2010

Citing 0
Cited 4

Controlling cache utilization of HPC applications

Proceedings of the international conference on Supercomputing
A work stealing scheduler for parallel loops on shared cache multicores

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Cache-conscious scheduling of streaming applications

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Cache-efficient parallel isosurface extraction for shared cache multicores

EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but they lack of theoretical performance guarantees. We present in this paper a {\schmi O}(N\log N) algorithm to compute a CO layout for unstructured but well shaped meshes. We prove that a coherent traversal of a N-size mesh in dimension d induces less than N/B+{\schmi O}(N/M^{1/d}) cache-misses where B and M are the block size and the cache size, respectively. Experiments show that our layout computation is faster and significantly less memory consuming than the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better when the BSP tree produced while computing the layout is used as an acceleration data structure adjusted to the layout. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures.