The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Stacked-widget visualization of scheduling-based algorithms
Proceedings of the 4th ACM symposium on Software visualization
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Interpreting Performance Data across Intuitive Domains
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Interactive visualization for memory reference traces
EuroVis'08 Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization
Visualizing program memory behavior using memory reference traces
Visualizing program memory behavior using memory reference traces
Hi-index | 0.00 |
We present an approach to investigate the memory behavior of a parallel kernel executing on thousands of threads simultaneously within the CUDA architecture. Our top-down approach allows for quickly identifying any significant differences between the execution of the many blocks and warps. As interesting warps are identified, we allow further investigation of memory behavior by visualizing the shared memory bank conflicts and global memory coalescence, first with an overview of a single warp with many operations and, subsequently, with a detailed view of a single warp and a single operation. We demonstrate the strength of our approach in the context of a parallel matrix transpose kernel and a parallel 1D Haar Wavelet transform kernel.