Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
State-of-the-art in heterogeneous computing
Scientific Programming
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Comparison of design and performance of snow cover computing on GPUs and multi-core processors
WSEAS Transactions on Information Science and Applications
Design and performance evaluation of snow cover computing on GPUs
ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
Highly scalable multi objective test suite minimisation using graphics cards
SSBSE'11 Proceedings of the Third international conference on Search based software engineering
Image and video processing on CUDA: state of the art and future directions
MACMESE'11 Proceedings of the 13th WSEAS international conference on Mathematical and computational methods in science and engineering
Parallelization of particle filter algorithms
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Optimizing tensor contraction expressions for hybrid CPU-GPU execution
Cluster Computing
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Recent progress and challenges in exploiting graphics processors in computational fluid dynamics
The Journal of Supercomputing
Motion vector extrapolation for parallel motion estimation on GPU
Multimedia Tools and Applications
Hi-index | 0.00 |
The availability of easily programmable manycore CPUs and GPUs has motivated investigations into how to best exploit their tremendous computational power for scientific computing. Here we demonstrate how a systems biology application—detection and tracking of white blood cells in video microscopy—can be accelerated by 200脳 using a CUDA-capable GPU. Because the algorithms and implementation challenges are common to a wide range of applications, we discuss general techniques that allow programmers to make efficient use of a manycore GPU.