Sparse matrix computations on manycore GPU's
Proceedings of the 45th annual Design Automation Conference
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
GRAMPS: A programming model for graphics pipelines
ACM Transactions on Graphics (TOG)
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Languages and Compilers for Parallel Computing
A Hardware Task Scheduler for Embedded Video Processing
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
3D finite difference computation on GPUs using CUDA
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Accelerating PQMRCGSTAB algorithm on GPU
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
A multi-streaming SIMD architecture for multimedia applications
Proceedings of the 6th ACM conference on Computing frontiers
Towards automatic program partitioning
Proceedings of the 6th ACM conference on Computing frontiers
High-performance SIMT code generation in an active visual effects library
Proceedings of the 6th ACM conference on Computing frontiers
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
ACM Transactions on Architecture and Code Optimization (TACO)
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
Solving Sparse Linear Systems on NVIDIA Tesla GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Experiences with Mapping Non-linear Memory Access Patterns into GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Probing biomolecular machines with graphics processors
Communications of the ACM - A View of Parallel Computing
A fast high quality pseudo random number generator for nVidia CUDA
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Solving quadratic assignment problems by genetic algorithms with GPU computation: a case study
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Understanding the efficiency of ray traversal on GPUs
Proceedings of the Conference on High Performance Graphics 2009
Efficient stream compaction on wide SIMD many-core architectures
Proceedings of the Conference on High Performance Graphics 2009
Stream compaction for deferred shading
Proceedings of the Conference on High Performance Graphics 2009
Programmable and Scalable Architecture for Graphics Processing Units
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Applying the Stream-Based Computing Model to Design Hardware Accelerators: A Case Study
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Efficient Mapping of Multiresolution Image Filtering Algorithms on Graphics Processors
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Real-Time GPU-Based Voxel Carving with Systematic Occlusion Handling
Proceedings of the 31st DAGM Symposium on Pattern Recognition
On GPU's viability as a middleware accelerator
Cluster Computing
Nodal discontinuous Galerkin methods on graphics processors
Journal of Computational Physics
Tracking as Segmentation of Spatial-Temporal Volumes by Anisotropic Weighted TV
EMMCVPR '09 Proceedings of the 7th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition
Efficient Multiplication of Polynomials on Graphics Hardware
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Probing Biomolecular Machines with Graphics Processors
Queue - Bioscience
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hybrid of genetic algorithm and local search to solve MAX-SAT problem using nVidia CUDA framework
Genetic Programming and Evolvable Machines
Multi-core platforms for signal processing: source and channel coding
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
TRaX: a multicore hardware architecture for real-time ray tracing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Interactive fluid-particle simulation using translating Eulerian grids
Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games
Teaching design & analysis of multi-core parallel algorithms using CUDA
Journal of Computing Sciences in Colleges
Parallel multiclass classification using SVMs on GPUs
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Iterative induced dipoles computation for molecular mechanics on GPUs
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
OptiX: a general purpose ray tracing engine
ACM SIGGRAPH 2010 papers
A compact harmonic code for early vision based on anisotropic frequency channels
Computer Vision and Image Understanding
Solving path problems on the GPU
Parallel Computing
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
A Network Congestion-Aware Memory Controller
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Efficient fault simulation on many-core processors
Proceedings of the 47th Design Automation Conference
Exploiting the reuse supplied by loop-dependent stream references for stream processors
ACM Transactions on Architecture and Code Optimization (TACO)
Understanding throughput-oriented architectures
Communications of the ACM
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
Multi-port abstraction layer for FPGA intensive memory exploitation applications
Journal of Systems Architecture: the EUROMICRO Journal
Asynchronous Communication Schemes for Finite Difference Methods on Multiple GPUs
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Cooperative Multitasking for GPU-Accelerated Grid Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A multi-streaming SIMD multimedia computing engine
Microprocessors & Microsystems
An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing
Journal of Parallel and Distributed Computing
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Pattern Recognition Letters
Tenor: making coding practical from servers to smartphones
Proceedings of the international conference on Multimedia
Distributed stream processing with DUP
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Finite element numerical integration on GPUs
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel variable-length encoding on GPGPUs
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Dynamic detection of uniform and affine vectors in GPGPU computations
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Journal of Computational Physics
Compact data structure and scalable algorithms for the sparse grid technique
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A highly-parallel TSP solver for a GPU computing platform
NMA'10 Proceedings of the 7th international conference on Numerical methods and applications
Parallel implementation of a spatio-temporal visual saliency model
Journal of Real-Time Image Processing
Translation-invariant two-dimensional discrete wavelet transform on graphics processing units
ECS'10/ECCTD'10/ECCOM'10/ECCS'10 Proceedings of the European conference of systems, and European conference of circuits technology and devices, and European conference of communications, and European conference on Computer science
Reducing branch divergence in GPU programs
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Floating-point data compression at 75 Gb/s on a GPU
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Massively Parallel Logic Simulation with GPUs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application
Facing the multicore-challenge
FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application
Facing the multicore-challenge
Structuring the unstructured middle with chunk computing
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
OUTRIDER: efficient memory latency tolerance with decoupled strands
Proceedings of the 38th annual international symposium on Computer architecture
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Proceedings of the 38th annual international symposium on Computer architecture
Considerations when evaluating microprocessor platforms
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
High-performance software rasterization on GPUs
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
Spatial hardware implementation for sparse graph algorithms in GraphStep
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Razor: An architecture for dynamic multiresolution ray tracing
ACM Transactions on Graphics (TOG)
Mathematical morphology in computer graphics, scientific visualization and visual exploration
ISMM'11 Proceedings of the 10th international conference on Mathematical morphology and its applications to image and signal processing
Optimization of N-queens solvers on graphics processors
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Implementation of an SDR platform using GPU and its application to a 2 × 2 MIMO WiMAX system
Analog Integrated Circuits and Signal Processing
CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Dymaxion: optimizing memory access patterns for heterogeneous systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Correlation analysis on GPU systems using NVIDIA's CUDA
Journal of Real-Time Image Processing
Geospatial overlay computation on the GPU
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Bandwidth-aware reconfigurable cache design with hybrid memory technologies
Proceedings of the International Conference on Computer-Aided Design
Modeling the computational efficiency of 2-D and 3-D silicon processors for early-chip planning
Proceedings of the International Conference on Computer-Aided Design
GPU-based parallel collision detection for fast motion planning
International Journal of Robotics Research
Better speedups using simpler parallel programming for graph connectivity and biconnectivity
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Mathematics and Computers in Simulation
High-performance Monte Carlo radiosity on GPU based on scene partitioning
Microprocessors & Microsystems
Hardware transactional memory for GPU architectures
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
High performance 3-D FFT using multiple CUDA GPUs
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Implementing p systems parallelism by means of GPUs
WMC'09 Proceedings of the 10th international conference on Membrane Computing
Implementing a GPU programming model on a Non-GPU accelerator architecture
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Smoldyn on Graphics Processing Units: Massively Parallel Brownian Dynamics Simulations
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient parallel CKY parsing on GPUs
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Revisiting finite difference and spectral migration methods on diverse parallel architectures
Computers & Geosciences
High-throughput antibody sequence alignment based on GPU computing
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Design patterns for scientific computations on sparse matrices
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Boosted human re-identification using Riemannian manifolds
Image and Vision Computing
Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems
Parallel Computing
Integrating data-intensive cloud computing with multicores and clusters in an HPC course
Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education
Simultaneous branch and warp interweaving for sustained GPU performance
Proceedings of the 39th Annual International Symposium on Computer Architecture
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
GPU-based parallel algorithms for sparse nonlinear systems
Journal of Parallel and Distributed Computing
GPU accelerated computation of the longest common subsequence
Facing the Multicore-Challenge II
Operating systems should manage accelerators
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Using blue gene/p and GPUs to accelerate computations in the EULAG model
LSSC'11 Proceedings of the 8th international conference on Large-Scale Scientific Computing
Fast and small nonlinear pseudorandom number generators for computer simulation
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Parallelization of EULAG model on multicore architectures with GPU accelerators
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Performance evaluation of hybrid implementation of support vector machine
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Power-efficient computing for compute-intensive GPGPU applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Fragment-parallel composite and filter
EGSR'10 Proceedings of the 21st Eurographics conference on Rendering
Wavelet-based multiresolution isosurface rendering
VG'10 Proceedings of the 8th IEEE/EG international conference on Volume Graphics
Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE
The Journal of Supercomputing
Direct approaches to exploit many-core architecture in bioinformatics
Future Generation Computer Systems
Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallel perfusion imaging processing using GPGPU
Computer Methods and Programs in Biomedicine
The CRNS framework and its application to programmable and reconfigurable cryptography
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Efficient data management for incoherent ray tracing
Applied Soft Computing
CUDA-Enabled Optimisation of Technical Analysis Parameters
DS-RT '12 Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications
Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing
Integration, the VLSI Journal
Practical time bundle adjustment for 3d reconstruction on the GPU
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part II
ACM Transactions on Embedded Computing Systems (TECS)
Communications of the ACM
GPUDet: a deterministic GPU architecture
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Improving GPGPU concurrency with elastic kernels
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
KFusion: optimizing data flow without compromising modularity
Proceedings of the 12th annual international conference on Aspect-oriented software development
Cache-Conscious Wavefront Scheduling
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring GPU architectures to accelerate semantic comparison for intention-based search
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Warp size impact in GPUs: large or small?
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale
Future of GPGPU micro-architectural parameters
Proceedings of the Conference on Design, Automation and Test in Europe
Microarchitectural mechanisms to exploit value structure in SIMT architectures
Proceedings of the 40th Annual International Symposium on Computer Architecture
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Reducing memory access latency with asymmetric DRAM bank organizations
Proceedings of the 40th Annual International Symposium on Computer Architecture
GPUWattch: enabling energy optimizations in GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
A network congestion-aware memory subsystem for manycore
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
An expansion-aided synchronous conservative time management algorithm on GPU
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
GPU-CC: a reconfigurable GPU architecture with communicating cores
Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
International Journal of High Performance Computing Applications
GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm
International Journal of High Performance Computing Applications
APOGEE: adaptive prefetching on GPUs for energy efficiency
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Neither more nor less: optimizing thread-level parallelism for GPGPUs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Scalability study of molecular dynamics simulation on Godson-T many-core architecture
Journal of Parallel and Distributed Computing
Computing resultants on Graphics Processing Units: Towards GPU-accelerated computer algebra
Journal of Parallel and Distributed Computing
Computers and Electrical Engineering
Divergence-aware warp scheduling
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A GPU-based discrete event simulation kernel
Simulation
Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures
ACM Transactions on Architecture and Code Optimization (TACO)
HARP: Harnessing inactive threads in many-core processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Optimising space exploration of OpenCL for GPGPUs
International Journal of Computational Science and Engineering
Design patterns for sparse-matrix computations on hybrid CPU/GPU platforms
Scientific Programming
Computers & Mathematics with Applications
Hi-index | 0.05 |
To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture. Its scalable parallel array of processors is massively multithreaded and programmable in C or via graphics APIs.