A graphics system architecture for interactive application-specific display functions
IEEE Computer Graphics and Applications
Pixel-planes 5: a heterogeneous multiprocessor graphics system using processor-enhanced memories
SIGGRAPH '89 Proceedings of the 16th annual conference on Computer graphics and interactive techniques
Scale-Space and Edge Detection Using Anisotropic Diffusion
IEEE Transactions on Pattern Analysis and Machine Intelligence
PixelFlow: high-speed rendering using image composition
SIGGRAPH '92 Proceedings of the 19th annual conference on Computer graphics and interactive techniques
Interactive multi-pass programmable shading
Proceedings of the 27th annual conference on Computer graphics and interactive techniques
Polygon rendering on a stream architecture
HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Communications of the ACM - Special issue on computer architecture
A user-programmable vertex engine
Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Ray tracing on programmable graphics hardware
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Efficient partitioning of fragment shaders for multipass rendering on programmable graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Imagine: Media Processing with Streams
IEEE Micro
Using modern graphics architectures for general-purpose computing: a framework and analysis
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
High level compilation for fine grained FPGAs
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Simulation of cloud dynamics on graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
Linear algebra operators for GPU implementation of numerical algorithms
ACM SIGGRAPH 2003 Papers
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
A Media-Enhanced Vector Architecture for Embedded Memory Systems
A Media-Enhanced Vector Architecture for Embedded Memory Systems
Performance of Various Computers Using Standard Linear Equations Software
Performance of Various Computers Using Standard Linear Equations Software
A programming system for the imagine media processor
A programming system for the imagine media processor
ACM SIGGRAPH 2004 Papers
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Fast Volume Segmentation With Simultaneous Visualization Using Programmable Graphics Hardware
Proceedings of the 14th IEEE Visualization 2003 (VIS'03)
Cheops: a reconfigurable data-flow system for video processing
IEEE Transactions on Circuits and Systems for Video Technology
ACM SIGGRAPH 2004 Papers
Scout: A Hardware-Accelerated System for Quantitatively Driven Visualization and Analysis
VIS '04 Proceedings of the conference on Visualization '04
GPU Cluster for High Performance Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Functionality Distribution for Parallel Rendering
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Efficient partitioning of fragment shaders for multiple-output hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Understanding the efficiency of GPU algorithms for matrix-matrix multiplication
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Cache aware optimization of stream programs
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Teleport messaging for distributed stream programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast and approximate stream mining of quantiles and frequencies using graphics processors
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
KD-tree acceleration structures for a GPU raytracer
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
A reconfigurable architecture for load-balanced rendering
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Optimizing stream programs using linear state space analysis
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Automatic Tuning Matrix Multiplication Performance on Graphics Hardware
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
ClawHMMER: A Streaming HMMer-Search Implementatio
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
I3D '06 Proceedings of the 2006 symposium on Interactive 3D graphics and games
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Glift: Generic, efficient, random-access GPU data structures
ACM Transactions on Graphics (TOG)
A versatile stereo implementation on commodity graphics hardware
Real-Time Imaging
Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)
Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation
HDR VolVis: High Dynamic Range Volume Visualization
IEEE Transactions on Visualization and Computer Graphics
MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
ACM SIGGRAPH 2006 Papers
Graphical Models - Special issue on PG2004
Voronoi-diagram based heuristics for the location of mobile and unreliable service providers
ACST'06 Proceedings of the 2nd IASTED international conference on Advances in computer science and technology
Hierarchical clustering of gene expression profiles with graphics hardware acceleration
Pattern Recognition Letters
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 4th international conference on Computer graphics and interactive techniques in Australasia and Southeast Asia
Chessboard domination on programmable graphics hardware
Proceedings of the 44th annual Southeast regional conference
Teaching programmable shaders: lightweight versus heavyweight approach
SIGGRAPH '05 ACM SIGGRAPH 2005 Educators program
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A memory model for scientific algorithms on graphics processors
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
GPU accelerated molecular dynamics simulation of thermal conductivities
Journal of Computational Physics
Teaching graphics with the openGL shading language
Proceedings of the 38th SIGCSE technical symposium on Computer science education
Real-time mesh simplification using the GPU
Proceedings of the 2007 symposium on Interactive 3D graphics and games
Interactive k-d tree GPU raytracing
Proceedings of the 2007 symposium on Interactive 3D graphics and games
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Proceedings of the 4th international conference on Computing frontiers
Proceedings of the 4th international conference on Computing frontiers
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Broad new OS research: challenges and opportunities
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Stream execution on wide-issue clustered VLIW architectures
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
Proceedings of the 21st annual international conference on Supercomputing
Cell placement on graphics processing units
Proceedings of the 20th annual conference on Integrated circuits and systems design
Status report: the manticore project
ML '07 Proceedings of the 2007 workshop on Workshop on ML
Streaming Algorithms for Biological Sequence Alignment on GPUs
IEEE Transactions on Parallel and Distributed Systems
Multi-Level Graph Layout on the GPU
IEEE Transactions on Visualization and Computer Graphics
Scout: a data-parallel programming language for graphics processors
Parallel Computing
Cache-efficient numerical algorithms using graphics hardware
Parallel Computing
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster
Parallel Computing
Massive parallel LDPC decoding on GPU
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A portable runtime interface for multi-level memory hierarchies
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
BSGP: bulk-synchronous GPU programming
ACM SIGGRAPH 2008 papers
Application development on hybrid systems
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable Parallel Programming with CUDA
Queue - GPU Computing
GPU acceleration of cutoff pair potentials for molecular modeling applications
Proceedings of the 5th conference on Computing frontiers
Visions for application development on hybrid computing systems
Parallel Computing
Parallel mapping algorithms for a novel mapping & configuration software for the FACETS project
CEA'08 Proceedings of the 2nd WSEAS International Conference on Computer Engineering and Applications
Advanced collective communication in aspen
Proceedings of the 22nd annual international conference on Supercomputing
A compiler framework for optimization of affine loop nests for gpgpus
Proceedings of the 22nd annual international conference on Supercomputing
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Design and evaluation of a compiler for embedded stream programs
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
StoreGPU: exploiting graphics processing units to accelerate distributed storage systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
A stream chip-multiprocessor for bioinformatics
ACM SIGARCH Computer Architecture News
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
Data parallel execution challenges and runtime performance of agent simulations on GPUs
Proceedings of the 2008 Spring simulation multiconference
Parallel programming models overview
ACM SIGGRAPH 2008 classes
Scalable parallel programming with CUDA
ACM SIGGRAPH 2008 classes
Implicitly-threaded parallelism in Manticore
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Adapting a message-driven parallel application to GPU-accelerated clusters
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
AES Encryption Implementation and Analysis on Commodity Graphics Processing Units
CHES '07 Proceedings of the 9th international workshop on Cryptographic Hardware and Embedded Systems
Systematic Parallelization of Medical Image Reconstruction for Graphics Hardware
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Optimus: efficient realization of streaming applications on FPGAs
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
TeraFLOP computing on a desktop PC with GPUs for 3D CFD
International Journal of Computational Fluid Dynamics - Mesoscopic Methods And Their Applications To CFD
GPU for Parallel On-Board Hyperspectral Image Processing
International Journal of High Performance Computing Applications
Exploiting loop-dependent stream reuse for stream processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Large calculation of the flow over a hypersonic vehicle using a GPU
Journal of Computational Physics
Using GPUs to improve multigrid solver performance on a cluster
International Journal of Computational Science and Engineering
GRAMPS: A programming model for graphics pipelines
ACM Transactions on Graphics (TOG)
Mutual Information Based Semi-Global Stereo Matching on the GPU
ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing
Spatial sound for video games and virtual environments utilizing real-time GPU-based convolution
Future Play '08 Proceedings of the 2008 Conference on Future Play: Research, Play, Share
Programmable transitions for video stream editing
Proceedings of the 6th International Conference on Computer Graphics, Virtual Reality, Visualisation and Interaction in Africa
hiCUDA: a high-level directive-based language for GPU programming
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Architecture-aware optimization targeting multithreaded stream computing
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Optimizing the parallel computation of linear recurrences using compact matrix representations
Journal of Parallel and Distributed Computing
An intelligent semi-automatic application porting system for application accelerators
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
High-performance SIMT code generation in an active visual effects library
Proceedings of the 6th ACM conference on Computing frontiers
A Unified Runtime System for Heterogeneous Multi-core Architectures
Euro-Par 2008 Workshops - Parallel Processing
Stream processing for fast and efficient rotated Haar-like features using rotated integral images
International Journal of Intelligent Systems Technologies and Applications
SLIPstream: scalable low-latency interactive perception on streaming data
Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video
How GPUs can outperform ASICs for fast LDPC decoding
Proceedings of the 23rd international conference on Supercomputing
Using many-core hardware to correlate radio astronomy signals
Proceedings of the 23rd international conference on Supercomputing
Synergistic execution of stream programs on multicores with accelerators
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
ACM Transactions on Architecture and Code Optimization (TACO)
The canals language and its compiler
Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems
Software Pipelined Execution of Stream Programs on GPUs
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Concurrent number cruncher: a GPU implementation of a general sparse linear solver
International Journal of Parallel, Emergent and Distributed Systems
Solving Sparse Linear Systems on NVIDIA Tesla GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Probing biomolecular machines with graphics processors
Communications of the ACM - A View of Parallel Computing
Precomputation-Based Rendering
Foundations and Trends® in Computer Graphics and Vision
Modelling and programming stream-based distributed computing based on the meta-pipeline approach
International Journal of Parallel, Emergent and Distributed Systems - Advances in Parallel and Distributed Computational Models
Automatic parallelization for graphics processing units
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
On GPU's viability as a middleware accelerator
Cluster Computing
Real-time Visual Tracker by Stream Processing
Journal of Signal Processing Systems
Journal of Signal Processing Systems
Nodal discontinuous Galerkin methods on graphics processors
Journal of Computational Physics
Debugging GPU stream programs through automatic dataflow recording and visualization
ACM SIGGRAPH Asia 2009 papers
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
Probing Biomolecular Machines with Graphics Processors
Queue - Bioscience
Flexible filters: load balancing through backpressure for stream programs
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
A platform for developing adaptable multicore applications
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Streaming HD H.264 encoder on programmable processors
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Accelerating geoscience and engineering system simulations on graphics hardware
Computers & Geosciences
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Using the graphics processor unit to realize data streaming operations
Proceedings of the 6th Middleware Doctoral Symposium
Compiler support for general-purpose computation on GPUs
The Journal of Supercomputing
ACM SIGGRAPH 2009 Courses
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Input-driven dynamic execution prediction of streaming applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Optimizations and Performance of a Robotics Grasping Algorithm Described in Geometric Algebra
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
A Framework for Object-Oriented Shader Design
ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part I
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming
APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
Comparison of two real-time image processing system approaches
CGIM '08 Proceedings of the Tenth IASTED International Conference on Computer Graphics and Imaging
Compiling Python to a hybrid execution environment
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Parallel LDPC decoding on GPUs using a stream-based computing approach
Journal of Computer Science and Technology - Special section on trust and reputation management in future computing systmes and applications
SP@CE: an SP-based programming model for consumer electronics streaming applications
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Expression and loop libraries for high-performance code synthesis
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Minimizing communication in rate-optimal software pipelining for stream programs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Application-guided tool development for architecturally diverse computation
Proceedings of the 2010 ACM Symposium on Applied Computing
Stream image processing on a dual-core embedded system
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Symmetric key cryptography on modern graphics hardware
ASIACRYPT'07 Proceedings of the Advances in Crypotology 13th international conference on Theory and application of cryptology and information security
Accelerating space variant Gaussian filtering on graphics processing unit
EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
Efficient implementation of GPGPU synchronization primitives on CPUs
Proceedings of the 7th ACM international conference on Computing frontiers
State-of-the-art in heterogeneous computing
Scientific Programming
Proceedings of the 3rd International Workshop on Multicore Software Engineering
Stream processing of geometric and central moments using high precision summed area tables
ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Exploiting the reuse supplied by loop-dependent stream references for stream processors
ACM Transactions on Architecture and Code Optimization (TACO)
An MPI-Stream Hybrid Programming Model for Computational Clusters
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Run-time optimizations for replicated dataflows on heterogeneous environments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Accelerating POCS interpolation of 3D irregular seismic data with Graphics Processing Units
Computers & Geosciences
Fast bio-inspired computation using a GPU-based systemic computer
Parallel Computing
Reuse-aware modulo scheduling for stream processors
Proceedings of the Conference on Design, Automation and Test in Europe
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
Journal of Signal Processing Systems
memCUDA: map device memory to host memory on GPGPU platform
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
An efficient implementation of GPU virtualization in high performance clusters
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Systemic computation using graphics processors
ICES'10 Proceedings of the 9th international conference on Evolvable systems: from biology to hardware
Automatically translating a general purpose C++ image processing library for GPUs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
GPU-ABiSort: optimal parallel sorting on stream architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A code motion technique for accelerating general-purpose computation on the GPU
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
MPEG-2 decoding in a stream programming language
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Simple optimizations for an applicative array language for graphics processors
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Streaming Data Movement for Real-Time Image Analysis
Journal of Signal Processing Systems
Acceleration of a CFD code with a GPU
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
A taxonomy of accelerator architectures and their programming models
IBM Journal of Research and Development
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A domain-specific approach to heterogeneous parallelism
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Bothnia: a dual-personality extension to the Intel integrated graphics driver
ACM SIGOPS Operating Systems Review
Orchestration by approximation: mapping stream programs onto multicore architectures
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Power and Performance Characterization of Computational Kernels on the GPU
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Parallel implementation of a spatio-temporal visual saliency model
Journal of Real-Time Image Processing
Analyzing program flow within a many-kernel OpenCL application
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Acceleration of acoustic emission signal processing algorithms using CUDA standard
Computer Standards & Interfaces
Journal of Signal Processing Systems
Implicitly threaded parallelism in manticore
Journal of Functional Programming
A programming model for GPU-based parallel computing with scalability and abstraction
Proceedings of the 25th Spring Conference on Computer Graphics
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Automatic CPU-GPU communication management and optimization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Proceedings of the 38th annual international symposium on Computer architecture
MPOpt-Cell: a high-performance data-flow programming environment for the CELL BE processor
Proceedings of the 8th ACM International Conference on Computing Frontiers
GPU computation in bioinspired algorithms: a review
IWANN'11 Proceedings of the 11th international conference on Artificial neural networks conference on Advances in computational intelligence - Volume Part I
Computing prestack Kirchhoff time migration on general purpose GPU
Computers & Geosciences
Accelerating code on multi-cores with fastflow
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
A fast, GPU based, dictionary attack to OpenPGP secret keyrings
Journal of Systems and Software
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Obsidian: a domain specific embedded language for parallel programming of graphics processors
IFL'08 Proceedings of the 20th international conference on Implementation and application of functional languages
Firepile: run-time compilation for GPUs in scala
Proceedings of the 10th ACM international conference on Generative programming and component engineering
High Performance Hybrid Functional Petri Net Simulations of Biological Pathway Models on CUDA
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A survey of medical image registration on graphics hardware
Computer Methods and Programs in Biomedicine
A Fast Iterative Method for Solving the Eikonal Equation on Triangulated Surfaces
SIAM Journal on Scientific Computing
The Journal of Supercomputing
ISVC'06 Proceedings of the Second international conference on Advances in Visual Computing - Volume Part II
Expressing pipeline parallelism using TBB constructs: a case study on what works and what doesn't
Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
An introduction to GPU accelerated surgical simulation
ISBMS'06 Proceedings of the Third international conference on Biomedical Simulation
Initial experiences porting a bioinformatics application to a graphics processor
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
The development of the data-parallel GPU programming language CGiS
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
A GPU implementation of level set multiview stereo
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Boosted algorithms for visual object detection on graphics processing units
ACCV'06 Proceedings of the 7th Asian conference on Computer Vision - Volume Part II
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
ACM Transactions on Architecture and Code Optimization (TACO)
Scheduling of synchronous data flow models on scratchpad memory based embedded processors
Proceedings of the International Conference on Computer-Aided Design
Cryptographics: secret key cryptography using graphics cards
CT-RSA'05 Proceedings of the 2005 international conference on Topics in Cryptology
Chestnut: a GPU programming language for non-experts
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Scalable framework for mapping streaming applications onto multi-GPU systems
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Efficient deadlock avoidance for streaming computation with filtering
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Accelerating protein structure recovery using graphics processing units
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Accelerated 2d image processing on GPUs
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Mapping streaming languages to general purpose processors through vectorization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Safe and familiar multi-core programming by means of a hybrid functional and imperative language
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Implementing p systems parallelism by means of GPUs
WMC'09 Proceedings of the 10th international conference on Membrane Computing
A universal calculus for stream processing languages
ESOP'10 Proceedings of the 19th European conference on Programming Languages and Systems
The CGiS compiler—a tool demonstration
CC'06 Proceedings of the 15th international conference on Compiler Construction
Towards cost-effective bio-inspired optimization: a prospective study on the GPU architecture
SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II
Characteristics of workloads using the pipeline programming model
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
StreamX10: a stream programming framework on X10
Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Profile-guided deployment of stream programs on multicores
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Compiling a high-level language for GPUs: (via language support for architectures and compilers)
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Adaptive input-aware compilation for graphics engines
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Characterizing and improving the use of demand-fetched caches in GPUs
Proceedings of the 26th ACM international conference on Supercomputing
A hierarchical component model for large parallel interactive applications
The Journal of Supercomputing
Elastic computing: A portable optimization framework for hybrid computers
Parallel Computing
Optimizing dataflow applications on heterogeneous environments
Cluster Computing
Experiences with high-level programming directives for porting applications to GPUs
Facing the Multicore-Challenge II
Parakeet: a just-in-time parallel accelerator for python
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Softshell: dynamic scheduling on GPUs
ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
GPUstore: harnessing GPU computing for storage systems in the OS kernel
Proceedings of the 5th Annual International Systems and Storage Conference
A low-overhead dedicated execution support for stream applications on shared-memory cmp
Proceedings of the tenth ACM international conference on Embedded software
The RACECAR heuristic for automatic function specialization on multi-core heterogeneous systems
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
GPU accelerated normalized mutual information and B-spline transformation
EG VCBM'08 Proceedings of the First Eurographics conference on Visual Computing for Biomedicine
CUDASA: compute unified device and systems architecture
EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization
Early evaluation of directive-based GPU programming models for productive exascale computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A fast implementation of the octagon abstract domain on graphics hardware
SAS'07 Proceedings of the 14th international conference on Static Analysis
Visualization for the Physical Sciences
Computer Graphics Forum
Concurrent number cruncher: an efficient sparse linear solver on the GPU
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Sigma*: symbolic learning of input-output specifications
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The Journal of Supercomputing
StreamTMC: Stream compilation for tiled multi-core architectures
Journal of Parallel and Distributed Computing
A compiler infrastructure for embedded heterogeneous MPSoCs
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
GPUfs: integrating a file system with GPUs
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Parallel execution of Java loops on Graphics Processing Units
Science of Computer Programming
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Embassies: radically refactoring the web
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Dynamic expressivity with static optimization for streaming languages
Proceedings of the 7th ACM international conference on Distributed event-based systems
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
An efficient scheduling scheme using estimated execution time for heterogeneous computing systems
The Journal of Supercomputing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
The shape of things to run: compiling complex stream graphs to reconfigurable hardware in lime
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Weir: a streaming language for performance analysis
Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
A catalog of stream processing optimizations
ACM Computing Surveys (CSUR)
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scheduling of synchronous data flow models onto scratchpad memory-based embedded processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on ESTIMedia'10
Flexible filters in stream programs
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GPUfs: Integrating a file system with GPUs
ACM Transactions on Computer Systems (TOCS)
OmpSs@Zynq all-programmable SoC ecosystem
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions
Proceedings of the Eleventh ACM International Conference on Embedded Software
Accelerated finite element elastodynamic simulations using the GPU
Journal of Computational Physics
Optimising space exploration of OpenCL for GPGPUs
International Journal of Computational Science and Engineering
Population-based harmony search using GPU applied to protein structure prediction
International Journal of Computational Science and Engineering
CPU+GPU scheduling with asymptotic profiling
Parallel Computing
A compiler infrastructure for embedded heterogeneous MPSoCs
Parallel Computing
Hi-index | 0.02 |
In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.