ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

Authors:
Weifeng Liu;Brian Vinter
Affiliations:
Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark;Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark
Venue:
Proceedings of Workshop on General Purpose Processing Using GPUs
Year:
2014

Citing 32
Cited 0

Data structures and algorithm analysis (2nd ed.)

Data structures and algorithm analysis (2nd ed.)
The influence of caches on the performance of heaps

Journal of Experimental Algorithmics (JEA)
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ in Depth Series)

C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ in Depth Series)
Mitigating Amdahl's Law through EPI Throttling

Proceedings of the 32nd annual international symposium on Computer Architecture
Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Amdahl's Law in the Multicore Era

Computer
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
COMPASS: a programmable data prefetcher using idle GPU shaders

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

Concurrency and Computation: Practice & Experience - Euro-Par 2009
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

SAAHPC '11 Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
GPUs and the Future of Parallel Computing

IEEE Micro
CPU-assisted GPGPU on fused CPU-GPU architectures

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
AMD Fusion APU: Llano

IEEE Micro
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures

Proceedings of the 9th conference on Computing Frontiers
Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE)

Proceedings of the 39th Annual International Symposium on Computer Architecture
Automatic generation of software pipelines for heterogeneous parallel systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Accelerating MapReduce on a coupled CPU-GPU architecture

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Power efficiency evaluation of block ciphers on GPU-integrated multicore processor

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Redefining the Role of the CPU in the Era of CPU-GPU Integration

IEEE Micro
Parallel suffix array and least common prefix for the GPU

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Accelerating simulation of agent-based models on heterogeneous architectures

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU

SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms

Proceedings of the ACM International Conference on Computing Frontiers
Reducing GPU offload latency via fine-grained CPU-GPU synchronization

HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
An On-chip Heterogeneous Implementation of a General Sparse Linear Solver

IPDPSW '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heap is one of the most important fundamental data structures in computer science. Unfortunately, for a long time heaps did not obtain ideal performance gain from widely used throughput-oriented processors because of two reasons: (1) heap property decides that operations between any parent node and its child nodes must be executed sequentially, and (2) heaps, even d-heaps (d-ary heaps or d-way heaps), cannot supply enough wide data parallelism to these processors. Recent research proposed more versatile asymmetric multicore processors (AMPs) that consist of two types of cores (latency-oriented cores with high single-thread performance and throughput-oriented cores with wide vector processing capability), unified memory address space and faster synchronization mechanism among cores with different ISAs. To leverage the AMPs for the heap data structure, in this paper we propose ad-heap, an efficient heap data structure that introduces an implicit bridge structure and properly apportions workloads to the two types of cores. We implement a batch k-selection algorithm and conduct experiments on simulated AMP environments composed of real CPUs and GPUs. In our experiments on two representative platforms, the ad-heap obtains up to 1.5x and 3.6x speedup over the optimal AMP scheduling method that executes the fastest d-heaps on the standalone CPUs and GPUs in parallel.