Matrix computations (3rd ed.)
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Concurrent number cruncher: an efficient sparse linear solver on the GPU
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Sparse matrix computations on manycore GPU's
Proceedings of the 45th annual Design Automation Conference
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Adapting a message-driven parallel application to GPU-accelerated clusters
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
3D finite difference computation on GPUs using CUDA
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
High-performance SIMT code generation in an active visual effects library
Proceedings of the 6th ACM conference on Computing frontiers
High-performance regular expression scanning on the Cell/B.E. processor
Proceedings of the 23rd international conference on Supercomputing
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Proceedings of the 23rd international conference on Supercomputing
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
Experiences with Mapping Non-linear Memory Access Patterns into GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Probing biomolecular machines with graphics processors
Communications of the ACM - A View of Parallel Computing
COMPASS: A Community-driven Parallelization Advisor for Sequential Software
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
A parallel algorithm for construction of uniform grids
Proceedings of the Conference on High Performance Graphics 2009
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Ray casting of multiple volumetric datasets with polyhedral boundaries on manycore GPUs
ACM SIGGRAPH Asia 2009 papers
Probing Biomolecular Machines with Graphics Processors
Queue - Bioscience
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Efficient band approximation of Gram matrices for large scale kernel methods on GPUs
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Fast tridiagonal solvers on the GPU
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Teaching design & analysis of multi-core parallel algorithms using CUDA
Journal of Computing Sciences in Colleges
Parallel multiclass classification using SVMs on GPUs
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Massively parallel forward modeling of scalar and tensor gravimetry data
Computers & Geosciences
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Proceedings of the 3rd International Workshop on Multicore Software Engineering
Proceedings of the 24th ACM International Conference on Supercomputing
Comparative analysis of data mining techniques for financial data using parallel processing
Proceedings of the 7th International Conference on Frontiers of Information Technology
Understanding throughput-oriented architectures
Communications of the ACM
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
PacketShader: a GPU-accelerated software router
Proceedings of the ACM SIGCOMM 2010 conference
Multi-GPU volume rendering using MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Pattern Recognition Letters
Parallel processing on NVIDIA graphics processing units using CUDA
Journal of Computing Sciences in Colleges
Learning CUDA: lab exercises and experiences
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Meta-simulation of large WSN on multi-core computers
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Parallel implementation of conjugate gradient method on graphics processors
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
GPU-supported object tracking using adaptive appearance models and particle swarm optimization
ICCVG'10 Proceedings of the 2010 international conference on Computer vision and graphics: Part II
Optimizing memory access on GPUs using morton order indexing
Proceedings of the 48th Annual Southeast Regional Conference
HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry
Proceedings of the Conference on High Performance Graphics
An analysis of queuing network simulation using GPU-based hardware acceleration
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Journal of Computational Physics
Throughput-Effective On-Chip Networks for Manycore Accelerators
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Copperhead: compiling an embedded data parallel language
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
On-the-fly elimination of dynamic irregularities for GPU computing
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Translation-invariant two-dimensional discrete wavelet transform on graphics processing units
ECS'10/ECCTD'10/ECCOM'10/ECCS'10 Proceedings of the European conference of systems, and European conference of circuits technology and devices, and European conference of communications, and European conference on Computer science
CnC-CUDA: declarative programming for GPUs
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
GPU-based fast motion estimation for on-the-fly encoding of computer-generated video streams
Proceedings of the 21st international workshop on Network and operating systems support for digital audio and video
Structuring the unstructured middle with chunk computing
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Fluid-structure coupling using lattice-Boltzmann and fixed-grid FEM
Finite Elements in Analysis and Design
MDR: performance model driven runtime for heterogeneous parallel platforms
Proceedings of the international conference on Supercomputing
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Proceedings of the 38th annual international symposium on Computer architecture
High performance content-based matching using GPUs
Proceedings of the 5th ACM international conference on Distributed event-based system
Simpler and faster HLBVH with work queues
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
SAH KD-tree construction on GPU
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
VoxelPipe: a programmable pipeline for 3D voxelization
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
Rapid simplification of multi-attribute meshes
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
Towards paradisEO-MO-GPU: a framework for GPU-based local search metaheuristics
IWANN'11 Proceedings of the 11th international conference on Artificial neural networks conference on Advances in computational intelligence - Volume Part I
Journal of Computational and Applied Mathematics
Case studies in automatic GPGPU code generation with llc
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Parameter optimisation in the receptor density algorithm
ICARIS'11 Proceedings of the 10th international conference on Artificial immune systems
Iterative sparse Matrix-Vector multiplication for integer factorization on GPUs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Memory access optimization in recurrent image processing algorithms with CUDA
Pattern Recognition and Image Analysis
A parallel implementation of the thresholding problem by using tissue-like P systems
CAIP'11 Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part II
Trasgo: a nested-parallel programming system
The Journal of Supercomputing
High Performance Hybrid Functional Petri Net Simulations of Biological Pathway Models on CUDA
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Liszt: a domain specific language for building portable mesh-based PDE solvers
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Dymaxion: optimizing memory access patterns for heterogeneous systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Design and implementation of seeds dispersion on graphic processor unit
Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry
Geospatial overlay computation on the GPU
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Optimization strategies in different CUDA architectures using llCoMP
Microprocessors & Microsystems
Safe and familiar multi-core programming by means of a hybrid functional and imperative language
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Hardware transactional memory for GPU architectures
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Implementing p systems parallelism by means of GPUs
WMC'09 Proceedings of the 10th international conference on Membrane Computing
GPU-Based multi-start local search algorithms
LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Inverse kinematics solution for robotic manipulators using a CUDA-Based parallel genetic algorithm
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Implementing a GPU programming model on a Non-GPU accelerator architecture
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Towards efficient execution of erasure codes on multicore architectures
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Smoldyn on Graphics Processing Units: Massively Parallel Brownian Dynamics Simulations
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Efficient parallel CKY parsing on GPUs
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Parallel preconditioned conjugate gradient algorithm on GPU
Journal of Computational and Applied Mathematics
A high-performance sorting algorithm for multicore single-instruction multiple-data processors
Software—Practice & Experience
Mapping a data-flow programming model onto heterogeneous platforms
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
FORMLESS: scalable utilization of embedded manycores in streaming applications
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
A virtual memory based runtime to support multi-tenancy in clusters with GPUs
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures
Proceedings of the 39th Annual International Symposium on Computer Architecture
GPU-based parallel algorithms for sparse nonlinear systems
Journal of Parallel and Distributed Computing
CUDAICA: GPU optimization of infomax-ICA EEG analysis
Computational Intelligence and Neuroscience - Special issue on Advanced Computational Techniques and Tools for Neuroscience
Accelerating the red/black SOR method using GPUs with CUDA
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Muppet: MapReduce-style processing of fast data
Proceedings of the VLDB Endowment
Performance evaluation of hybrid implementation of support vector machine
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Ray tracing dynamic scenes with shadows on GPU
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallel solution of the subset-sum problem: an empirical study
Concurrency and Computation: Practice & Experience
Efficient data management for incoherent ray tracing
Applied Soft Computing
A VM-aware fairness scheduler on heterogenous multi-core platforms
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Artificial Neural Network Simulation on CUDA
DS-RT '12 Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications
Exploring alternative flexible OpenCL (FlexCL) core designs in FPGA-based MPSoC systems
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Parallel Shellsort Algorithm for Many-Core GPUs with CUDA
International Journal of Grid and High Performance Computing
A Simple Compressive Sensing Algorithm for Parallel Many-Core Architectures
Journal of Signal Processing Systems
Grex: An efficient MapReduce framework for graphics processing units
Journal of Parallel and Distributed Computing
Optimizing tensor contraction expressions for hybrid CPU-GPU execution
Cluster Computing
Parallel strategies for 2D Discrete Wavelet Transform in shared memory systems and GPUs
The Journal of Supercomputing
Accelerating universal Kriging interpolation algorithm using CUDA-enabled GPU
Computers & Geosciences
Segmenting images with gradient-based edge detection using Membrane Computing
Pattern Recognition Letters
Fairness scheduler for virtual machines on heterogonous multi-core platforms
ACM SIGAPP Applied Computing Review
Journal of Computing Sciences in Colleges
Speeding up model building for ECGA on CUDA platform
Proceedings of the 15th annual conference on Genetic and evolutionary computation
ParadisEO-MO-GPU: a framework for parallel GPU-based local search metaheuristics
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Proceedings of the ACM International Conference on Computing Frontiers
Microarchitectural mechanisms to exploit value structure in SIMT architectures
Proceedings of the 40th Annual International Symposium on Computer Architecture
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
PixelPie: maximal Poisson-disk sampling with rasterization
Proceedings of the 5th High-Performance Graphics Conference
Optimising lossless stages in a GPU-based MPEG encoder
Multimedia Tools and Applications
Fast 3D wavelet transform on multicore and many-core computing platforms
The Journal of Supercomputing
A preliminary evaluation of OpenACC implementations
The Journal of Supercomputing
Progress towards accelerating HOMME on hybrid multi-core systems
International Journal of High Performance Computing Applications
Designing on-chip networks for throughput accelerators
ACM Transactions on Architecture and Code Optimization (TACO)
GPU acceleration of the WSM6 cloud microphysics scheme in GRAPES model
Computers & Geosciences
Scheduling concurrent applications on a cluster of CPU-GPU nodes
Future Generation Computer Systems
Energy efficient GPU transactional memory via space-time optimizations
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Restoring surfaces after removing objects in indoor 3D point clouds
Proceedings of the Fourth Symposium on Information and Communication Technology
A decomposition for in-place matrix transposition
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Portable, MPI-interoperable coarray fortran
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
A GPU accelerated algorithm for 3D Delaunay triangulation
Proceedings of the 18th meeting of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
Extending a distributed virtual reality system with exchangeable rendering back-ends
The Visual Computer: International Journal of Computer Graphics
Accelerating incremental checkpointing for extreme-scale computing
Future Generation Computer Systems
Journal of Parallel and Distributed Computing
Writing scalable SIMD programs with ISPC
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Frequency-based re-sequencing tool for short reads on graphics processing units
International Journal of Computational Science and Engineering
International Journal of Computational Science and Engineering
Motion vector extrapolation for parallel motion estimation on GPU
Multimedia Tools and Applications
Journal of Real-Time Image Processing
Hi-index | 0.03 |
The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore's law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.