Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Program optimization carving for GPU computing
Journal of Parallel and Distributed Computing
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A control-structure splitting optimization for GPGPU
Proceedings of the 6th ACM conference on Computing frontiers
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Auto-tuning 3-D FFT library for CUDA GPUs
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
The Cilk++ concurrency platform
The Journal of Supercomputing
The International Exascale Software Project roadmap
International Journal of High Performance Computing Applications
Quantifying NUMA and contention effects in multi-GPU systems
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11 Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
Performance and Power Analysis of ATI GPU: A Statistical Approach
NAS '11 Proceedings of the 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage
Performance Characterization and Optimization of Atomic Operations on AMD GPUs
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
The convergence of HPC and embedded systems in our heterogeneous computing future
ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
Early evaluation of directive-based GPU programming models for productive exascale computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
The HELLS-join: a heterogeneous stream join for extremely large windows
Proceedings of the Ninth International Workshop on Data Management on New Hardware
Evaluating integrated graphics processors for data center workloads
Proceedings of the Workshop on Power-Aware Computing and Systems
Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures
ACM Transactions on Architecture and Code Optimization (TACO)
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
With the rise of general purpose computing on graphics processing units (GPGPU), the influence from consumer markets can now be seen across the spectrum of computer architectures. In fact, many of the high-ranking Top500 HPC systems now include these accelerators. Traditionally, GPUs have connected to the CPU via the PCIe bus, which has proved to be a significant bottleneck for scalable scientific applications. Now, a trend toward tighter integration between CPU and GPU has removed this bottleneck and unified the memory hierarchy for both CPU and GPU cores. We examine the impact of this trend for high performance scientific computing by investigating AMD's new Fusion Accelerated Processing Unit (APU) as a testbed. In particular, we evaluate the tradeoffs in performance, power consumption, and programmability when comparing this unified memory hierarchy with similar, but discrete GPUs.