Accelerating simulation of agent-based models on heterogeneous architectures

Authors:
Jin Wang;Norman Rubin;Haicheng Wu;Sudhakar Yalamanchili
Affiliations:
Georgia Institute of Technology;Advanced Micro Devices;Georgia Institute of Technology;Georgia Institute of Technology
Venue:
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Year:
2013

Citing 12
Cited 1

Practical in-place mergesort

Nordic Journal of Computing
GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems)

GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems)
Efficient stream compaction on wide SIMD many-core architectures

Proceedings of the Conference on High Performance Graphics 2009
An Efficient GPU Implementation for Large Scale Individual-Based Simulation of Collective Behavior

HIBI '09 Proceedings of the 2009 International Workshop on High Performance Computational Systems Biology
Efficient simulation of agent-based models on multi-GPU and multi-core clusters

Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques
Fast in-place sorting with CUDA based on bitonic sort

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Agent-based computing from multi-agent systems to agent-based models: a visual survey

Scientometrics
Designing APU Oriented Scientific Computing Applications in OpenCL

HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

SAAHPC '11 Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
GPUs and the Future of Parallel Computing

IEEE Micro
Designing a unified programming model for heterogeneous machines

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

The wide usage of GPGPU programming models and compiler techniques enables the optimization of data-parallel programs on commodity GPUs. However, mapping GPGPU applications running on discrete parts to emerging integrated heterogeneous architectures such as the AMD Fusion APU and Intel Sandy/Ivy bridge with the CPU and the GPU on the same die has not been well studied. Classic time-step simulation applications represented by agent-based models have the intrinsic parallel structure that is a good fit for GPGPU architectures. However, when mapping these applications directly to the integrated GPUs, the performance may degrade due to less computation units and lower clock speed. This paper proposes an optimization to the GPGPU implementation of the agent-based model and illustrates it in the traffic simulation example. The optimization adapts the algorithm by moving part of the workload to the CPU to leverage the integrated architecture and the on-chip memory bus which is faster than the PCIe bus that connects the discrete GPU and the host. The experiments on discrete AMD Radeon GPU and AMD Fusion APU demonstrate that the optimization can achieve 1.08--2.71x performance speedup on the integrated architecture over the discrete platform.