Proceedings of the 38th annual international symposium on Computer architecture
Massively parallel programming models used as hardware description languages: the OpenCL case
Proceedings of the International Conference on Computer-Aided Design
Power and performance analysis of GPU-accelerated systems
HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
accULL: an OpenACC implementation with CUDA and OpenCL support
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
OMB-GPU: a micro-benchmark suite for evaluating MPI libraries on GPU clusters
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
CUPL: a compile-time uncoalesced memory access pattern locator for CUDA
Proceedings of the 27th international ACM conference on International conference on supercomputing
Cooperative boosting: needy versus greedy power management
Proceedings of the 40th Annual International Symposium on Computer Architecture
SIMD divergence optimization through intra-warp compaction
Proceedings of the 40th Annual International Symposium on Computer Architecture
Coordinated energy management in heterogeneous processors
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A preliminary evaluation of OpenACC implementations
The Journal of Supercomputing
Trellis: Portability across architectures with a high-level framework
Journal of Parallel and Distributed Computing
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
The recently released Rodinia benchmark suite enables users to evaluate heterogeneous systems including both accelerators, such as GPUs, and multicore CPUs. As Rodinia sees higher levels of acceptance, it becomes important that researchers understand this new set of benchmarks, especially in how they differ from previous work. In this paper, we present recent extensions to Rodinia and conduct a detailed characterization of the Rodinia benchmarks (including performance results on an NVIDIA GeForce GTX480, the first product released based on the Fermi architecture). We also compare and contrast Rodinia with Parsec to gain insights into the similarities and differences of the two benchmark collections; we apply principal component analysis to analyze the application space coverage of the two suites. Our analysis shows that many of the workloads in Rodinia and Parsec are complementary, capturing different aspects of certain performance metrics.