Efficient Organization of Large Multidimensional Arrays
Proceedings of the Tenth International Conference on Data Engineering
Fast additions on masked integers
ACM SIGPLAN Notices
Analyzing block locality in Morton-order and Morton-hybrid matrices
ACM SIGARCH Computer Architecture News
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Converting to and from Dilated Integers
IEEE Transactions on Computers
Scalable Parallel Programming with CUDA
Queue - GPU Computing
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Parallel Computing Experiences with CUDA
IEEE Micro
Understanding the efficiency of ray traversal on GPUs
Proceedings of the Conference on High Performance Graphics 2009
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Parallel design for error-resilient entropy coding algorithm on GPU
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
High performance computing environments are not freely available to every scientist or programmer. However, massively parallel computational devices are available in nearly every workstation class computer and laptop sold today. The programmable GPU gives immense computational power to a user in a standard office environment; however, programming a GPU to function efficiently is not a trivial task. An issue of primary concern is memory latency, if not managed properly it can cost the GPU in performance resulting in increased runtimes waiting for data. In this paper we describe an optimization of memory access methods on GPUs using Morton order indexing, sometimes referred to as Z-order index.