A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads

Authors:
Shuai Che;Jeremy W. Sheaffer;Michael Boyer;Lukasz G. Szafaryn; Liang Wang;Kevin Skadron
Affiliations:
The University of Virginia, Department of Computer Science, USA;The University of Virginia, Department of Computer Science, USA;The University of Virginia, Department of Computer Science, USA;The University of Virginia, Department of Computer Science, USA;The University of Virginia, Department of Computer Science, USA;The University of Virginia, Department of Computer Science, USA
Venue:
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Year:
2010

Citing 0
Cited 13

SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading

Proceedings of the 38th annual international symposium on Computer architecture
Massively parallel programming models used as hardware description languages: the OpenCL case

Proceedings of the International Conference on Computer-Aided Design
Power and performance analysis of GPU-accelerated systems

HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
accULL: an OpenACC implementation with CUDA and OpenCL support

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
OMB-GPU: a micro-benchmark suite for evaluating MPI libraries on GPU clusters

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
CUPL: a compile-time uncoalesced memory access pattern locator for CUDA

Proceedings of the 27th international ACM conference on International conference on supercomputing
Cooperative boosting: needy versus greedy power management

Proceedings of the 40th Annual International Symposium on Computer Architecture
SIMD divergence optimization through intra-warp compaction

Proceedings of the 40th Annual International Symposium on Computer Architecture
Coordinated energy management in heterogeneous processors

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A preliminary evaluation of OpenACC implementations

The Journal of Supercomputing
Trellis: Portability across architectures with a high-level framework

Journal of Parallel and Distributed Computing
Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recently released Rodinia benchmark suite enables users to evaluate heterogeneous systems including both accelerators, such as GPUs, and multicore CPUs. As Rodinia sees higher levels of acceptance, it becomes important that researchers understand this new set of benchmarks, especially in how they differ from previous work. In this paper, we present recent extensions to Rodinia and conduct a detailed characterization of the Rodinia benchmarks (including performance results on an NVIDIA GeForce GTX480, the first product released based on the Fermi architecture). We also compare and contrast Rodinia with Parsec to gain insights into the similarities and differences of the two benchmark collections; we apply principal component analysis to analyze the application space coverage of the two suites. Our analysis shows that many of the workloads in Rodinia and Parsec are complementary, capturing different aspects of certain performance metrics.