Modeling GPU-CPU workloads and systems
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
An OpenCL framework for heterogeneous multicores with local memory
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Throughput-Effective On-Chip Networks for Manycore Accelerators
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating CUDA graph algorithms at maximum warp
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
On-the-fly elimination of dynamic irregularities for GPU computing
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Power and Performance Characterization of Computational Kernels on the GPU
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Quantifying NUMA and contention effects in multi-GPU systems
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
CnC-CUDA: declarative programming for GPUs
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Considering GPGPU for HPC centers: is it worth the effort?
Facing the multicore-challenge
Considering GPGPU for HPC centers: is it worth the effort?
Facing the multicore-challenge
MDR: performance model driven runtime for heterogeneous parallel platforms
Proceedings of the international conference on Supercomputing
Proceedings of the 38th annual international symposium on Computer architecture
Automatic OpenCL device characterization: guiding optimized kernel design
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Evaluation of an accelerator architecture for speckle reducing anisotropic diffusion
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Dymaxion: optimizing memory access patterns for heterogeneous systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Exploring the limits of GPGPU scheduling in control flow bound applications
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Seamlessly portable applications: Managing the diversity of modern heterogeneous systems
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Identifying hotspots in a program for data parallel architecture: an early experience
Proceedings of the 5th India Software Engineering Conference
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Poster: determining code segments that can benefit from execution on GPUs
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Improving GPU performance via large warps and two-level warp scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A compile-time managed multi-level register file hierarchy
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing off-chip memory traffic by selective cache management scheme in GPGPUs
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
The "Chimera": an off-the-shelf CPU/GPGPU/FPGA hybrid computing platform
International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing
A methodology for energy-quality tradeoff using imprecise hardware
Proceedings of the 49th Annual Design Automation Conference
International Journal of High Performance Computing Applications
Thermal management of a many-core processor under fine-grained parallelism
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Boosting single thread performance in mobile processors via reconfigurable acceleration
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Dynamically managed data for CPU-GPU architectures
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Dynamic binary rewriting and migration for shared-ISA asymmetric, multicore processors: summary
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Characterizing and improving the use of demand-fetched caches in GPUs
Proceedings of the 26th ACM international conference on Supercomputing
One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation
Proceedings of the 26th ACM international conference on Supercomputing
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Energy-efficient GPU design with reconfigurable in-package graphics memory
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A systematic process for efficient execution on Intel's heterogeneous computation nodes
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Simultaneous branch and warp interweaving for sustained GPU performance
Proceedings of the 39th Annual International Symposium on Computer Architecture
CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures
Proceedings of the 39th Annual International Symposium on Computer Architecture
Performance and productivity of new programming languages
Facing the Multicore-Challenge II
Parakeet: a just-in-time parallel accelerator for python
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Gdev: first-class GPU resource management in the operating system
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
SPEC OMP2012 -- an application benchmark suite for parallel systems using OpenMP
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Feedback-Based global instruction scheduling for GPGPU applications
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Power-aware multi-core simulation for early design stage hardware/software co-optimization
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
RISE: improving the streaming processors reliability against soft errors in gpgpus
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Workload and power budget partitioning for single-chip heterogeneous processors
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Early evaluation of directive-based GPU programming models for productive exascale computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ValuePack: value-based scheduling framework for CPU-GPU clusters
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Spill code placement for SIMD machines
SBLP'12 Proceedings of the 16th Brazilian conference on Programming Languages
Proceedings of the International Conference on Computer-Aided Design
OpenMPC: extended OpenMP for efficient programming and tuning on GPUs
International Journal of Computational Science and Engineering
Automatic problem size sensitive task partitioning on heterogeneous parallel systems
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Data layout optimization for GPGPU architectures
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Prius: a runtime for hybrid computing
Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores
Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
Efficient design space exploration of GPGPU architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Inter-warp instruction temporal locality in deep-multithreaded GPUs
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
GPUDet: a deterministic GPU architecture
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Improving GPGPU concurrency with elastic kernels
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Cache-Conscious Wavefront Scheduling
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Warp size impact in GPUs: large or small?
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
An automatic input-sensitive approach for heterogeneous task partitioning
Proceedings of the 27th international ACM conference on International conference on supercomputing
Proceedings of the 27th international ACM conference on International conference on supercomputing
Scaling large-data computations on multi-GPU accelerators
Proceedings of the 27th international ACM conference on International conference on supercomputing
Portable mapping of openMP to multicore embedded systems using MCA APIs
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Performance characterization of data-intensive kernels on AMD Fusion architectures
Computer Science - Research and Development
Proceedings of the ACM International Conference on Computing Frontiers
Load balancing in a changing world: dealing with heterogeneity and performance variability
Proceedings of the ACM International Conference on Computing Frontiers
RFiof: an RF approach to I/O-pin and memory controller scalability for off-chip memories
Proceedings of the ACM International Conference on Computing Frontiers
Cost-effective soft-error protection for SRAM-based structures in GPGPUs
Proceedings of the ACM International Conference on Computing Frontiers
Using synchronization stalls in power-aware accelerators
Proceedings of the Conference on Design, Automation and Test in Europe
Characterizing the performance benefits of fused CPU/GPU systems using FusionSim
Proceedings of the Conference on Design, Automation and Test in Europe
Microarchitectural mechanisms to exploit value structure in SIMT architectures
Proceedings of the 40th Annual International Symposium on Computer Architecture
Exploring memory consistency for massively-threaded throughput-oriented processors
Proceedings of the 40th Annual International Symposium on Computer Architecture
Cooperative boosting: needy versus greedy power management
Proceedings of the 40th Annual International Symposium on Computer Architecture
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation
Proceedings of the 40th Annual International Symposium on Computer Architecture
SIMD divergence optimization through intra-warp compaction
Proceedings of the 40th Annual International Symposium on Computer Architecture
GPUWattch: enabling energy optimizations in GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior
Proceedings of the 40th Annual International Symposium on Computer Architecture
Coordinated energy management in heterogeneous processors
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A data-centric profiler for parallel programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Designing on-chip networks for throughput accelerators
ACM Transactions on Architecture and Code Optimization (TACO)
Memory performance estimation of CUDA programs
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Exploring hybrid memory for GPU energy efficiency through software-hardware co-design
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Neither more nor less: optimizing thread-level parallelism for GPGPUs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
DANBI: dynamic scheduling of irregular stream programs for many-core systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Starchart: hardware and software optimization using recursive partitioning regression trees
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
ACM Transactions on Programming Languages and Systems (TOPLAS)
A measurement study of GPU DVFS on energy conservation
Proceedings of the Workshop on Power-Aware Computing and Systems
A sound and complete abstraction for reasoning about parallel prefix sums
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Scheduling concurrent applications on a cluster of CPU-GPU nodes
Future Generation Computer Systems
Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture
Journal of Parallel and Distributed Computing
Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A locality-aware memory hierarchy for energy-efficient GPU architectures
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Divergence-aware warp scheduling
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Warped gates: gating aware scheduling and power gating for GPGPUs
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Heterogeneous system coherence for integrated CPU-GPU systems
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Paraprox: pattern-based approximation for data parallel applications
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Rhythm: harnessing data parallel hardware for server workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Heterogeneous-race-free memory models
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Portable and Transparent Host-Device Communication Optimization for GPGPU Environments
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Selecting representative benchmark inputs for exploring microprocessor design spaces
ACM Transactions on Architecture and Code Optimization (TACO)
Optimization power consumption model of reliability-aware GPU clusters
The Journal of Supercomputing
An efficient compiler framework for cache bypassing on GPUs
Proceedings of the International Conference on Computer-Aided Design
Dynamic load balancing on heterogeneous multi-GPU systems
Computers and Electrical Engineering
An application-centric evaluation of OpenCL on multi-core CPUs
Parallel Computing
HARP: Harnessing inactive threads in many-core processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
KMA: A Dynamic Memory Manager for OpenCL
Proceedings of Workshop on General Purpose Processing Using GPUs
Proceedings of Workshop on General Purpose Processing Using GPUs
Power Modeling for Heterogeneous Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Boosting CUDA Applications with CPU---GPU Hybrid Computing
International Journal of Parallel Programming
Hi-index | 0.00 |
This paper presents and characterizes Rodinia, a benchmark suite for heterogeneous computing. To help architects study emerging platforms such as GPUs (Graphics Processing Units), Rodinia includes applications and kernels which target multi-core CPU and GPU platforms. The choice of applications is inspired by Berkeley's dwarf taxonomy. Our characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.