Pixel-planes 5: a heterogeneous multiprocessor graphics system using processor-enhanced memories
SIGGRAPH '89 Proceedings of the 16th annual conference on Computer graphics and interactive techniques
PixelFlow: high-speed rendering using image composition
SIGGRAPH '92 Proceedings of the 19th annual conference on Computer graphics and interactive techniques
A scalable hardware render accelerator using a modified scanline algorithm
SIGGRAPH '92 Proceedings of the 19th annual conference on Computer graphics and interactive techniques
A Sorting Classification of Parallel Rendering
IEEE Computer Graphics and Applications
Hardware accelerated rendering of CSG and transparency
SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
I-COLLIDE: an interactive and exact collision detection system for large-scale environments
I3D '95 Proceedings of the 1995 symposium on Interactive 3D graphics
Computer graphics (2nd ed. in C): principles and practice
Computer graphics (2nd ed. in C): principles and practice
Hierarchical polygon tiling with coverage masks
SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
Talisman: commodity realtime 3D graphics for the PC
SIGGRAPH '96 Proceedings of the 23rd annual conference on Computer graphics and interactive techniques
Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Simple models of the impact of overlap in bucket rendering
HWWS '98 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Parallel programming in OpenMP
Parallel programming in OpenMP
A parallel algorithm for polygon rasterization
SIGGRAPH '88 Proceedings of the 15th annual conference on Computer graphics and interactive techniques
Lightning-2: a high-performance display subsystem for PC clusters
Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Real-Time Rendering
ZR: a 3D API transparent technology for chunk rendering
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Architecture of the Pentium Microprocessor
IEEE Micro
Imagine: Media Processing with Streams
IEEE Micro
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
Designing graphics architectures around scalability and communication
Designing graphics architectures around scalability and communication
OpenGL(R) Shading Language
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Best of Both Latency and Throughput
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Hardware-Assisted Visibility Sorting for Unstructured Volume Rendering
IEEE Transactions on Visualization and Computer Graphics
GPU-accelerated high-quality hidden surface removal
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Multi-level ray tracing algorithm
ACM SIGGRAPH 2005 Papers
The irregular Z-buffer: Hardware acceleration for irregular data structures
ACM Transactions on Graphics (TOG)
ACM SIGGRAPH 2006 Papers
Multi-fragment effects on the GPU using the k-buffer
Proceedings of the 2007 symposium on Interactive 3D graphics and games
Proceedings of the 34th annual international symposium on Computer architecture
Practical logarithmic rasterization for low-error shadow maps
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Intel threading building blocks
Intel threading building blocks
EGSR'04 Proceedings of the Fifteenth Eurographics conference on Rendering Techniques
Real-time Reyes-style adaptive surface subdivision
ACM SIGGRAPH Asia 2008 papers
Logarithmic perspective shadow maps
ACM Transactions on Graphics (TOG)
Efficient implementation of sorting on multi-core SIMD CPU architecture
Proceedings of the VLDB Endowment
GRAMPS: A programming model for graphics pipelines
ACM Transactions on Graphics (TOG)
Soft irregular shadow mapping: fast, high-quality, and robust soft shadows
Proceedings of the 2009 symposium on Interactive 3D graphics and games
Light interaction with human skin: from believable images to predictable models
ACM SIGGRAPH ASIA 2008 courses
Accelerating critical section execution with asymmetric multi-core architectures
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
StreamRay: a stream filtering architecture for coherent ray tracing
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Evaluation of memory performance on the cell BE with the SARC programming model
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
GViM: GPU-accelerated virtual machines
Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
An efficient GPU-based approach for interactive global illumination
ACM SIGGRAPH 2009 papers
High-performance regular expression scanning on the Cell/B.E. processor
Proceedings of the 23rd international conference on Supercomputing
Using many-core hardware to correlate radio astronomy signals
Proceedings of the 23rd international conference on Supercomputing
Programming model for a heterogeneous x86 platform
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Time-predictable computer architecture
EURASIP Journal on Embedded Systems - FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms
AnySP: anytime anywhere anyway signal processing
Proceedings of the 36th annual international symposium on Computer architecture
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Thread motion: fine-grained power management for multi-core systems
Proceedings of the 36th annual international symposium on Computer architecture
Practical Random Linear Network Coding on GPUs
NETWORKING '09 Proceedings of the 8th International IFIP-TC 6 Networking Conference
A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Viewpoint: Face the inevitable, embrace parallelism
Communications of the ACM - The Status of the P versus NP Problem
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Data-parallel rasterization of micropolygons with defocus and motion blur
Proceedings of the Conference on High Performance Graphics 2009
Proceedings of the Conference on High Performance Graphics 2009
Selective and adaptive supersampling for real-time ray tracing
Proceedings of the Conference on High Performance Graphics 2009
Efficient ray traced soft shadows using multi-frusta tracing
Proceedings of the Conference on High Performance Graphics 2009
Faster incoherent rays: Multi-BVH ray stream tracing
Proceedings of the Conference on High Performance Graphics 2009
Efficient stream compaction on wide SIMD many-core architectures
Proceedings of the Conference on High Performance Graphics 2009
Stream compaction for deferred shading
Proceedings of the Conference on High Performance Graphics 2009
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
ClearPath: highly parallel collision avoidance for multi-agent simulation
Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Programmable and Scalable Architecture for Graphics Processing Units
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
A Data Parallel Algorithm for XML DOM Parsing
XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
Efficient Multiplication of Polynomials on Graphics Hardware
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
SSE Implementation of Multivariate PKCs on Modern x86 CPUs
CHES '09 Proceedings of the 11th International Workshop on Cryptographic Hardware and Embedded Systems
Ray casting of multiple volumetric datasets with polyhedral boundaries on manycore GPUs
ACM SIGGRAPH Asia 2009 papers
GPU virtualization on VMware's hosted I/O architecture
ACM SIGOPS Operating Systems Review
Achieving high memory performance from heterogeneous architectures with the SARC programming model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Helios: heterogeneous multiprocessing with satellite kernels
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Massively parallel processing: it's déjà vu all over again
Proceedings of the 46th Annual Design Automation Conference
APRON: a cellular processor array simulation and hardware design tool
EURASIP Journal on Advances in Signal Processing - CNN technology for spatiotemporal signal processing
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
PFunc: modern task parallelism for modern high performance computing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Towards a framework for abstracting accelerators in parallel applications: experience with cell
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An adaptative game loop architecture with automatic distribution of tasks between CPU and GPU
Computers in Entertainment (CIE) - SPECIAL ISSUE: Games
ACM SIGGRAPH 2009 Courses
Complexity effective memory access scheduling for many-core accelerator architectures
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Introduction to GPU programming for EDA
Proceedings of the 2009 International Conference on Computer-Aided Design
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
Utilizing predictors for efficient thermal management in multiprocessor SoCs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
TRaX: a multicore hardware architecture for real-time ray tracing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
FreePipe: a programmable parallel rendering architecture for efficient multi-fragment effects
Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games
Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming
International Journal of High Performance Computing Applications
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Scalability of relaxed consistency models in NoC based multicore architectures
ACM SIGARCH Computer Architecture News
A self-adaptive scheduler for asymmetric multi-cores
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture
Proceedings of the 7th ACM international conference on Computing frontiers
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
State-of-the-art in heterogeneous computing
Scientific Programming
Towards dense linear algebra for hybrid GPU accelerated manycore systems
Parallel Computing
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the 24th ACM International Conference on Supercomputing
OpenMP extensions for FPGA accelerators
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
Dynamic warp subdivision for integrated branch and memory divergence tolerance
Proceedings of the 37th annual international symposium on Computer architecture
Web search using mobile cores: quantifying and mitigating the price of efficiency
Proceedings of the 37th annual international symposium on Computer architecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Speeding up homomorpic hashing using GPUs
ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application
Efficient fault simulation on many-core processors
Proceedings of the 47th Design Automation Conference
Memory efficient ray tracing with hierarchical mesh quantization
Proceedings of Graphics Interface 2010
Exploiting the reuse supplied by loop-dependent stream references for stream processors
ACM Transactions on Architecture and Code Optimization (TACO)
Remote Process Execution and Remote File I/O for Heterogeneous Processors in Cluster Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A Memory Centric Kernel Framework for Accelerating Short-Range, Interactive Particle Simulation
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
directCell: hybrid systems with tightly coupled accelerators
IBM Journal of Research and Development
The reverse-acceleration model for programming petascale hybrid systems
IBM Journal of Research and Development
Introduction to the wire-speed processor and architecture
IBM Journal of Research and Development
A multi-streaming SIMD multimedia computing engine
Microprocessors & Microsystems
PacketShader: a GPU-accelerated software router
Proceedings of the ACM SIGCOMM 2010 conference
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proximity coherence for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An OpenCL framework for heterogeneous multicores with local memory
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A programmable parallel accelerator for learning and classification
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Revisiting sorting for GPGPU stream architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
GPU virtualization on VMware's hosted I/O architecture
WIOV'08 Proceedings of the First conference on I/O virtualization
Fast field solver for the simulation of large-area OLEDs
Microelectronics Journal
A balanced programming model for emerging heterogeneous multicore systems
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Real-time collision culling of a million bodies on graphics processing units
ACM SIGGRAPH Asia 2010 papers
Detail-preserving fully-Eulerian interface tracking framework
ACM SIGGRAPH Asia 2010 papers
Fast parallel surface and solid voxelization on GPUs
ACM SIGGRAPH Asia 2010 papers
Many-core virtual machines: decoupling abstract from concrete concurrency
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
MEDEA: a hybrid shared-memory/message-passing multiprocessor NoC-based architecture
Proceedings of the Conference on Design, Automation and Test in Europe
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
Destination-based adaptive routing on 2D mesh networks
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A link arbitration scheme for quality of service in a latency-optimized network-on-chip
Proceedings of the Conference on Design, Automation and Test in Europe
Latency criticality aware on-chip communication
Proceedings of the Conference on Design, Automation and Test in Europe
A memory interface for multi-purpose multi-stream accelerators
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Rank based dynamic voltage and frequency scaling fortiled graphics processors
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Comparing last-level cache designs for CMP architectures
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Weighted random oblivious routing on torus networks
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Power-efficient spilling techniques for chip multiprocessors
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Efficient address mapping of shared cache for on-chip many-core architecture
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Efficient throughput-guarantees for latency-sensitive networks-on-chip
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Coherent depth test scheme in FreePipe
Proceedings of the 9th ACM SIGGRAPH Conference on Virtual-Reality Continuum and its Applications in Industry
A capabilities-aware framework for using computational accelerators in data-intensive computing
Journal of Parallel and Distributed Computing
A lazy object-space shading architecture with decoupled sampling
Proceedings of the Conference on High Performance Graphics
Task management for irregular-parallel workloads on the GPU
Proceedings of the Conference on High Performance Graphics
Parallel SAH k-D tree construction
Proceedings of the Conference on High Performance Graphics
Efficient bounding of displaced Bézier patches
Proceedings of the Conference on High Performance Graphics
Parallel-vector algorithms for particle simulations on shared-memory multiprocessors
Journal of Computational Physics
Erasing Core Boundaries for Robust and Configurable Performance
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Communications of the ACM
ACM SIGOPS Operating Systems Review
Bothnia: a dual-personality extension to the Intel integrated graphics driver
ACM SIGOPS Operating Systems Review
Optimizing a shared virtual memory system for a heterogeneous CPU-accelerator platform
ACM SIGOPS Operating Systems Review
Landing stencil code on Godson-T
Journal of Computer Science and Technology
Exascale computing technology challenges
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Applying parallel design techniques to template matching with GPUs
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Decoupled sampling for graphics pipelines
ACM Transactions on Graphics (TOG)
Journal of Signal Processing Systems
Programming heterogeneous clusters with accelerators using object-based programming
Scientific Programming
SSLShader: cheap SSL acceleration with commodity processors
Proceedings of the 8th USENIX conference on Networked systems design and implementation
A programming model for GPU-based parallel computing with scalability and abstraction
Proceedings of the 25th Spring Conference on Computer Graphics
Mind the gap!: bridging the dichotomy of design and implementation
Proceedings of the 4th International Workshop on Software Engineering for Computational Science and Engineering
A minimalist cache coherent MPSoC designed for FPGAs
International Journal of High Performance Systems Architecture
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets
Proceedings of the international conference on Supercomputing
Energy-efficient mechanisms for managing thread context in throughput processors
Proceedings of the 38th annual international symposium on Computer architecture
Considerations when evaluating microprocessor platforms
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Mobile processors for energy-efficient web search
ACM Transactions on Computer Systems (TOCS)
High-performance software rasterization on GPUs
Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics
Razor: An architecture for dynamic multiresolution ray tracing
ACM Transactions on Graphics (TOG)
T&I engine: traversal and intersection engine for hardware accelerated ray tracing
Proceedings of the 2011 SIGGRAPH Asia Conference
Stylization-based ray prioritization for guaranteed frame rates
Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Non-Photorealistic Animation and Rendering
Green challenges to system software in data centers
Frontiers of Computer Science in China
Designing fast architecture-sensitive tree search on modern multicore/many-core processors
ACM Transactions on Database Systems (TODS)
Obsidian: a domain specific embedded language for parallel programming of graphics processors
IFL'08 Proceedings of the 20th international conference on Implementation and application of functional languages
High-performance 3D compressive sensing MRI reconstruction using many-core architectures
Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
A memory accelerator with gather functions for bandwidth-bound irregular applications
Proceedings of the first workshop on Irregular applications: architectures and algorithm
A hoare calculus for the verification of synchronous languages
PLPV '12 Proceedings of the sixth workshop on Programming languages meets program verification
Exploring high throughput computing paradigm for global routing
Proceedings of the International Conference on Computer-Aided Design
Design and analysis of adaptive processor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification
ACM Transactions on Architecture and Code Optimization (TACO)
Extending a C-like language for portable SIMD programming
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Region scheduling: efficiently using the cache architectures via page-level affinity
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Hardware support for OpenMP collective operations
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Hardware transactional memory for GPU architectures
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Topology-Aware OpenMP process scheduling
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
Multi core design for chip level multiprocessing
Advanced Lectures on Software Engineering
A parallelizing compiler cooperative heterogeneous multicore processor architecture
Transactions on High-Performance Embedded Architectures and Compilers IV
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
The case for elastic operating system services in fos
Proceedings of the 49th Annual Design Automation Conference
Extending a highly parallel data mining algorithm to the intel ® many integrated core architecture
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Dynamic compilation of data-parallel kernels for vector processors
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Exploring cross-layer power management for PGAS applications on the SCC platform
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors
Proceedings of the 26th ACM international conference on Supercomputing
3D rasterization: a bridge between rasterization and ray casting
Proceedings of Graphics Interface 2012
CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
Proceedings of the 39th Annual International Symposium on Computer Architecture
Viper: virtual pipelines for enhanced reliability
Proceedings of the 39th Annual International Symposium on Computer Architecture
Can traditional programming bridge the Ninja performance gap for parallel computing applications?
Proceedings of the 39th Annual International Symposium on Computer Architecture
Special Section on CANS: Ray prioritization using stylization and visual saliency
Computers and Graphics
Softshell: dynamic scheduling on GPUs
ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Riposte: a trace-driven compiler and parallel VM for vector code in R
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Power-efficient computing for compute-intensive GPGPU applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Fragment-parallel composite and filter
EGSR'10 Proceedings of the 21st Eurographics conference on Rendering
Simulation of radio wave propagation by beam tracing
EG PGV'09 Proceedings of the 9th Eurographics conference on Parallel Graphics and Visualization
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
NUMA-aware graph mining techniques for performance and energy efficiency
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Improving Data Locality for Efficient In-Core Path Tracing
Computer Graphics Forum
Tsunami: massively parallel homomorphic hashing on many-core GPUs
Concurrency and Computation: Practice & Experience
Graphics processing unit (GPU) programming strategies and trends in GPU computing
Journal of Parallel and Distributed Computing
GPP-Grep: high-speed regular expression processing engine on general purpose processors
RAID'12 Proceedings of the 15th international conference on Research in Attacks, Intrusions, and Defenses
A Simple Compressive Sensing Algorithm for Parallel Many-Core Architectures
Journal of Signal Processing Systems
Vector Extensions for Decision Support DBMS Acceleration
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A sort-based deferred shading architecture for decoupled sampling
ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
Fast deformation of volume data using tetrahedral mesh rasterization
Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Exploring memory consistency for massively-threaded throughput-oriented processors
Proceedings of the 40th Annual International Symposium on Computer Architecture
Locality-aware task management for unstructured parallelism: a quantitative limit study
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Distributed run-time resource management for malleable applications on many-core platforms
Proceedings of the 50th Annual Design Automation Conference
An energy and bandwidth efficient ray tracing architecture
Proceedings of the 5th High-Performance Graphics Conference
A divide and conquer based distributed run-time mapping methodology for many-core platforms
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
International Journal of High Performance Computing Applications
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Destination-based congestion awareness for adaptive routing in 2D mesh networks
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Designing on-chip networks for throughput accelerators
ACM Transactions on Architecture and Code Optimization (TACO)
On supernode transformations and multithreading for the longest common subsequence problem
AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
RSVM: a region-based software virtual memory for GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Expandable process networks to efficiently specify and explore task, data, and pipeline parallelism
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Optimization of interconnects between accelerators and shared memories in dark silicon
Proceedings of the International Conference on Computer-Aided Design
Boosting CUDA Applications with CPU---GPU Hybrid Computing
International Journal of Parallel Programming
A Case Study of Implementing Supernode Transformations
International Journal of Parallel Programming
Hi-index | 0.02 |
This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent on-die 2nd level cache allows efficient inter-processor communication and high-bandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this architecture uses binning in order to reduce required memory bandwidth, minimize lock contention, and increase opportunities for parallelism relative to standard GPUs. The Larrabee native programming model supports a variety of highly parallel applications that use irregular data structures. Performance analysis on those applications demonstrates Larrabee's potential for a broad range of parallel computation.