Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems

Authors:
Perhaad Mistry;Yash Ukidave;Dana Schaa;David Kaeli
Affiliations:
Northeastern University Boston, MA;Northeastern University Boston, MA;Northeastern University Boston, MA;Northeastern University Boston, MA
Venue:
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Year:
2013

Citing 24
Cited 1

Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Teleport messaging for distributed stream programs

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The HPC Challenge (HPCC) benchmark suite

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Accelerating SQL database operations on a GPU with CUDA

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Web search using mobile cores: quantifying and mitigating the price of efficiency

Proceedings of the 37th annual international symposium on Computer architecture
Maestro: data orchestration and tuning for OpenCL devices

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Fidelity and scaling of the PARSEC benchmark inputs

IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads

IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

IEEE Transactions on Parallel and Distributed Systems
Analyzing program flow within a many-kernel OpenCL application

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Energy-efficient mechanisms for managing thread context in throughput processors

Proceedings of the 38th annual international symposium on Computer architecture
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer

ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
A family of real-time Java benchmarks

Concurrency and Computation: Practice & Experience
Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Clearing the clouds: a study of emerging scale-out workloads on modern hardware

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

Proceedings of the 9th conference on Computing Frontiers
Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems

ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads

IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Redefining the Role of the CPU in the Era of CPU-GPU Integration

IEEE Micro

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heterogeneous systems have grown in popularity within the commercial platform and application developer communities. We have seen a growing number of systems incorporating CPUs, Graphics Processors (GPUs) and Accelerated Processing Units (APUs combine a CPU and GPU on the same chip). These emerging class of platforms are now being targeted to accelerate applications where the host processor (typically a CPU) and compute device (typically a GPU) co-operate on a computation. In this scenario, the performance of the application is not only dependent on the processing power of the respective heterogeneous processors, but also on the efficient interaction and communication between them. To help architects and application developers to quantify many of the key aspects of heterogeneous execution, this paper presents a new set of benchmarks called the Valar. The Valar benchmarks are applications specifically chosen to study the dynamic behavior of OpenCL applications that will benefit from host-device interaction. We describe the general characteristics of our benchmarks, focusing on specific characteristics that can help characterize heterogeneous applications. For the purposes of this paper we focus on OpenCL as our programming environment, though we envision versions of Valar in additional heterogeneous programming languages. We profile the Valar benchmarks based on their mapping and execution on different heterogeneous systems. Our evaluation examines optimizations for host-device communication and the effects of closely-coupled execution of the benchmarks on the multiple OpenCL devices present in heterogeneous systems.