High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
Learning CUDA: lab exercises and experiences
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Towards personal high-performance geospatial computing (HPC-G): perspectives and a case study
Proceedings of the ACM SIGSPATIAL International Workshop on High Performance and Distributed Geographic Information Systems
Indexing large-scale raster geospatial data using massively parallel GPGPU computing
Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems
Scalable SMT-based verification of GPU kernel functions
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Top ten ways to make formal methods for HPC practical
Proceedings of the FSE/SDP workshop on Future of software engineering research
Introduction to computer graphics
ACM SIGGRAPH ASIA 2010 Courses
Heterogeneous spline surface intersections
Proceedings of the 26th Spring Conference on Computer Graphics
Breaking the GPU programming barrier with the auto-parallelising SAC compiler
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Parallel programming for multimedia applications
Multimedia Tools and Applications
Achieving a single compute device image in OpenCL for multiple GPUs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
International Journal of High Performance Computing Applications
SU (2) lattice gauge theory simulations on Fermi GPUs
Journal of Computational Physics
Fast Mersenne prime testing on the GPU
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
A programming language interface to describe transformations and code generation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Comparison of design and performance of snow cover computing on GPUs and multi-core processors
WSEAS Transactions on Information Science and Applications
Importance of explicit vectorization for CPU and GPU software performance
Journal of Computational Physics
Design and performance evaluation of snow cover computing on GPUs
ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
Effect of the block occupancy in GPGPU over the performance of particle swarm algorithm
ICANNGA'11 Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part I
Simulation of bevel gear cutting with GPGPUs--performance and productivity
Computer Science - Research and Development
Free surface flow simulations on GPGPUs using the LBM
Computers & Mathematics with Applications
Preparing students for future architectures with an exploration of multi- and many-core performance
Proceedings of the 16th annual joint conference on Innovation and technology in computer science education
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Bitwise operations for GPU implementation of genetic algorithms
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Journal of Computational Physics
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
Proceedings of the 8th ACM International Conference on Computing Frontiers
Preliminary work on graphics processing unit based direct simulation Monte Carlo
Proceedings of the 2010 Conference on Grand Challenges in Modeling & Simulation
Anechoic Blind Source Separation Using Wigner Marginals
The Journal of Machine Learning Research
GPU-efficient recursive filtering and summed-area tables
Proceedings of the 2011 SIGGRAPH Asia Conference
Time dependent quantum reactive scattering on GPU
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part I
On the GPGPU parallelization issues of finite element approximate inverse preconditioning
Journal of Computational and Applied Mathematics
Parallel implementation of a computational model of the human immune system
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Static GPU threads and an improved scan algorithm
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Genetic Programming and Evolvable Machines
Modelling sperm behaviour in a 3D environment
Proceedings of the 9th International Conference on Computational Methods in Systems Biology
The right balance: restructuring the parallel and scientific computing course
Journal of Computing Sciences in Colleges
Performances of Navier-Stokes solver on a hybrid CPU/GPU computing system
PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
RDVideo: A new lossless video codec on GPU
ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing - Volume Part II
GPU accelerated CESE method for 1D shock tube problems
Journal of Computational Physics
Extreme enumeration on GPU and in clouds: how many dollars you need to break SVP challenges
CHES'11 Proceedings of the 13th international conference on Cryptographic hardware and embedded systems
Learning CUDA: lab exercises and experiences, part 2
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Smooth conditional transition paths in dynamical gaussian networks
KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
Pseudo-Random Number Generation on GP-GPU
PADS '11 Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation
GROPHECY: GPU performance projection from CPU code skeletons
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the usage of GPUs for efficient motion estimation in medical image sequences
Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
Fault attack to the elliptic curve digital signature algorithm with multiple bit faults
Proceedings of the 4th international conference on Security of information and networks
Speeding up large-scale geospatial polygon rasterization on GPGPUs
Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information Systems
Introducing scalable quantum approaches in language representation
QI'11 Proceedings of the 5th international conference on Quantum interaction
Efficient parallel implementations of controlled optimization of traffic phases
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Massively parallel identification of intersection points for GPGPU ray tracing
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Spiking neural P system simulations on a high performance GPU platform
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Introduction to computer graphics
SIGGRAPH Asia 2011 Courses
Advances in Engineering Software
fMRI analysis on the GPU-Possibilities and challenges
Computer Methods and Programs in Biomedicine
Accelerating aerial image simulation with GPU
Proceedings of the International Conference on Computer-Aided Design
Performance potential for simulating spin models on GPU
Journal of Computational Physics
Improving GPU performance via large warps and two-level warp scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 43rd ACM technical symposium on Computer Science Education
Aggregate gaze visualization with real-time heatmaps
Proceedings of the Symposium on Eye Tracking Research and Applications
Towards user transparent parallel multimedia computing on GPU-Clusters
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Speeding up a chaos-based image encryption algorithm using GPGPU
EUROCAST'11 Proceedings of the 13th international conference on Computer Aided Systems Theory - Volume Part I
A spiking neural p system simulator based on CUDA
CMC'11 Proceedings of the 12th international conference on Membrane Computing
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient parallel CKY parsing on GPUs
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms
Computer Methods and Programs in Biomedicine
True 4D image denoising on the GPU
Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
Parallel terrain visibility calculation on the graphics processing unit
Concurrency and Computation: Practice & Experience
Pricing barrier and American options under the SABR model on the graphics processing unit
Concurrency and Computation: Practice & Experience
GPU accelerated AES-CBC for database applications
Proceedings of the 27th Annual ACM Symposium on Applied Computing
A fair comparison of modern CPUs and GPUs running the genetic algorithm under the knapsack benchmark
EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Virtualization of reconfigurable coprocessors in HPRC systems with multicore architecture
Journal of Systems Architecture: the EUROMICRO Journal
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
Proceedings of the 26th ACM international conference on Supercomputing
Journal of Computational Physics
Productivity of GPUs under different programming paradigms
Concurrency and Computation: Practice & Experience
GPU-accelerated T-matrix algorithm for light-scattering simulations
Journal of Computational Physics
An improved CUDA-based implementation of differential evolution on GPU
Proceedings of the 14th annual conference on Genetic and evolutionary computation
A GPU-based implementation of an enhanced GEP algorithm
Proceedings of the 14th annual conference on Genetic and evolutionary computation
GPU accelerated computation of the longest common subsequence
Facing the Multicore-Challenge II
Efficient AMG on heterogeneous systems
Facing the Multicore-Challenge II
Graphics programming for the web
ACM SIGGRAPH 2012 Courses
Using compiler directives for accelerating CFD applications on GPUs
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Swarm grid: a proposal for high performance of parallel particle swarm optimization using GPGPU
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
A three-dimensional computational model of the innate immune system
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
GPU acceleration of the caffa3d.MB model
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
U2SOD-DB: a database system to manage large-scale ubiquitous urban sensing origin-destination data
Proceedings of the ACM SIGKDD International Workshop on Urban Computing
C-DAC's efforts: application kernels on HPC cluster with GPU accelerators
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems
Proceedings of the VLDB Endowment
GPGPU implementation of growing neural gas: Application to 3D scene reconstruction
Journal of Parallel and Distributed Computing
Automatic CUDA code synthesis framework for multicore CPU and GPU architectures
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Accelerating the red/black SOR method using GPUs with CUDA
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
GPU accelerated real time rotation, scale and translation invariant image registration method
ICIAR'12 Proceedings of the 9th international conference on Image Analysis and Recognition - Volume Part I
CEFP'11 Proceedings of the 4th Summer School conference on Central European Functional Programming School
Simulation of surface fire fronts using fireLib and GPUs
Environmental Modelling & Software
RISE: improving the streaming processors reliability against soft errors in gpgpus
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Teaching parallelism with GPUS and a Game of life assignment
Journal of Computing Sciences in Colleges
Agent-based simulation for large-scale emergency response: A survey of usage and implementation
ACM Computing Surveys (CSUR)
Central force optimization on a GPU: a case study in high performance metaheuristics
The Journal of Supercomputing
Interactive particle tracing in time-varying tetrahedral grids
EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
Three-dimensional thinning algorithms on graphics processing units and multicore CPUs
Concurrency and Computation: Practice & Experience
Parametric flows: automated behavior equivalencing for symbolic analysis of races in CUDA programs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Fast box-counting algorithm on GPU
Computer Methods and Programs in Biomedicine
A script-based autotuning compiler system to generate high-performance CUDA code
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Parallel SOR for solving the convection diffusion equation using GPUs with CUDA
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
CUDA-Enabled Optimisation of Technical Analysis Parameters
DS-RT '12 Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications
ICEC'12 Proceedings of the 11th international conference on Entertainment Computing
Parallel CHC algorithm for solving dynamic traveling salesman problem using many-core GPU
AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Population dynamics p systems on CUDA
CMSB'12 Proceedings of the 10th international conference on Computational Methods in Systems Biology
Accelerating text mining workloads in a MapReduce-based distributed GPU environment
Journal of Parallel and Distributed Computing
Towards a finite volume model on a many-core platform
International Journal of High Performance Systems Architecture
Comparison of dense stereo using CUDA
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part II
Using the SkelCL library for high-level GPU programming of 2d applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Highly Parallelable Bidimensional Median Filter for Modern Parallel Programming Models
Journal of Signal Processing Systems
A high performance parallel DCT with OpenCL on heterogeneous computing environment
Multimedia Tools and Applications
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Parallel statistical analysis of analog circuits by GPU-accelerated graph-based approach
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
A GPU-accelerated envelope-following method for switching power converter simulation
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
GPU-based acceleration of an RNA tertiary structure prediction algorithm
Computers in Biology and Medicine
Automatic selection of regions of interest in a video by a depth-color image matting
Proceedings of the International Workshop on Video and Image Ground Truth in Computer Vision Applications
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
uBench: exposing the impact of CUDA block geometry in terms of performance
The Journal of Supercomputing
GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm
International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications
Modelling and Simulation in Engineering
GPU acceleration of the WSM6 cloud microphysics scheme in GRAPES model
Computers & Geosciences
Use of multiple GPUs on shared memory multiprocessors for ultrasound propagation simulations
AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
Neither more nor less: optimizing thread-level parallelism for GPGPUs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
IWANN'13 Proceedings of the 12th international conference on Artificial Neural Networks: advances in computational intelligence - Volume Part I
Massively parallel Monte Carlo for many-particle simulations on GPUs
Journal of Computational Physics
Computer Methods and Programs in Biomedicine
Heterogeneous COS pricing of rainbow options
WHPCF '13 Proceedings of the 6th Workshop on High Performance Computational Finance
User transparent data and task parallel multimedia computing with Pyxis-DT
Future Generation Computer Systems
Parallel processing of intersections for ray-tracing in application-specific processors and GPGPUs
Microprocessors & Microsystems
SAPPHIRE: A toolkit for building efficient stream programs for medical video analysis
Computer Methods and Programs in Biomedicine
Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs
Journal of Computational Physics
Optimizing LZSS compression on GPGPUs
Future Generation Computer Systems
Computers and Electrical Engineering
A simple GPU-accelerated two-dimensional MUSCL-Hancock solver for ideal magnetohydrodynamics
Journal of Computational Physics
Super linear speedup in a local parallel meshless solution of thermo-fluid problems
Computers and Structures
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications
Proceedings of Workshop on General Purpose Processing Using GPUs
Recent progress and challenges in exploiting graphics processors in computational fluid dynamics
The Journal of Supercomputing
Accelerating Single Iteration Performance of CUDA-Based 3D Reaction---Diffusion Simulations
International Journal of Parallel Programming
Multiagent and Grid Systems
Integrated Computer-Aided Engineering
Progressive high-quality response surfaces for visually guided sensitivity analysis
EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization
Hi-index | 0.05 |
Multi-core processors are no longer the future of computing-they are the present day reality. A typical mass-produced CPU features multiple processor cores, while a GPU (Graphics Processing Unit) may have hundreds or even thousands of cores. With the rise of multi-core architectures has come the need to teach advanced programmers a new and essential skill: how to program massively parallel processors.Programming Massively Parallel Processors: A Hands-on Approach shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs. Teaches computational thinking and problem-solving techniques that facilitate high-performance parallel computing.Utilizes CUDA (Compute Unified Device Architecture), NVIDIA's software development tool created specifically for massively parallel environments.Shows you how to achieve both high-performance and high-reliability using the CUDA programming model as well as OpenCL.