Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Teleport messaging for distributed stream programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The HPC Challenge (HPCC) benchmark suite
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Accelerating SQL database operations on a GPU with CUDA
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Web search using mobile cores: quantifying and mitigating the price of efficiency
Proceedings of the 37th annual international symposium on Computer architecture
Maestro: data orchestration and tuning for OpenCL devices
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Fidelity and scaling of the PARSEC benchmark inputs
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures
IEEE Transactions on Parallel and Distributed Systems
Analyzing program flow within a many-kernel OpenCL application
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Energy-efficient mechanisms for managing thread context in throughput processors
Proceedings of the 38th annual international symposium on Computer architecture
Where is the data? Why you cannot debate CPU vs. GPU performance without the answer
ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
A family of real-time Java benchmarks
Concurrency and Computation: Practice & Experience
Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
Proceedings of the 9th conference on Computing Frontiers
Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems
ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads
IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
Heterogeneous systems have grown in popularity within the commercial platform and application developer communities. We have seen a growing number of systems incorporating CPUs, Graphics Processors (GPUs) and Accelerated Processing Units (APUs combine a CPU and GPU on the same chip). These emerging class of platforms are now being targeted to accelerate applications where the host processor (typically a CPU) and compute device (typically a GPU) co-operate on a computation. In this scenario, the performance of the application is not only dependent on the processing power of the respective heterogeneous processors, but also on the efficient interaction and communication between them. To help architects and application developers to quantify many of the key aspects of heterogeneous execution, this paper presents a new set of benchmarks called the Valar. The Valar benchmarks are applications specifically chosen to study the dynamic behavior of OpenCL applications that will benefit from host-device interaction. We describe the general characteristics of our benchmarks, focusing on specific characteristics that can help characterize heterogeneous applications. For the purposes of this paper we focus on OpenCL as our programming environment, though we envision versions of Valar in additional heterogeneous programming languages. We profile the Valar benchmarks based on their mapping and execution on different heterogeneous systems. Our evaluation examines optimizations for host-device communication and the effects of closely-coupled execution of the benchmarks on the multiple OpenCL devices present in heterogeneous systems.