Data structures and algorithm analysis (2nd ed.)
Data structures and algorithm analysis (2nd ed.)
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ in Depth Series)
Mitigating Amdahl's Law through EPI Throttling
Proceedings of the 32nd annual international symposium on Computer Architecture
Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Amdahl's Law in the Multicore Era
Computer
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
COMPASS: a programmable data prefetcher using idle GPU shaders
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
ACM Transactions on Architecture and Code Optimization (TACO)
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11 Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
GPUs and the Future of Parallel Computing
IEEE Micro
CPU-assisted GPGPU on fused CPU-GPU architectures
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
IEEE Micro
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
Proceedings of the 9th conference on Computing Frontiers
Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE)
Proceedings of the 39th Annual International Symposium on Computer Architecture
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Accelerating MapReduce on a coupled CPU-GPU architecture
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Power efficiency evaluation of block ciphers on GPU-integrated multicore processor
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Parallel suffix array and least common prefix for the GPU
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Accelerating simulation of agent-based models on heterogeneous architectures
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU
SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
Proceedings of the ACM International Conference on Computing Frontiers
Reducing GPU offload latency via fine-grained CPU-GPU synchronization
HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
An On-chip Heterogeneous Implementation of a General Sparse Linear Solver
IPDPSW '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Heap is one of the most important fundamental data structures in computer science. Unfortunately, for a long time heaps did not obtain ideal performance gain from widely used throughput-oriented processors because of two reasons: (1) heap property decides that operations between any parent node and its child nodes must be executed sequentially, and (2) heaps, even d-heaps (d-ary heaps or d-way heaps), cannot supply enough wide data parallelism to these processors. Recent research proposed more versatile asymmetric multicore processors (AMPs) that consist of two types of cores (latency-oriented cores with high single-thread performance and throughput-oriented cores with wide vector processing capability), unified memory address space and faster synchronization mechanism among cores with different ISAs. To leverage the AMPs for the heap data structure, in this paper we propose ad-heap, an efficient heap data structure that introduces an implicit bridge structure and properly apportions workloads to the two types of cores. We implement a batch k-selection algorithm and conduct experiments on simulated AMP environments composed of real CPUs and GPUs. In our experiments on two representative platforms, the ad-heap obtains up to 1.5x and 3.6x speedup over the optimal AMP scheduling method that executes the fastest d-heaps on the standalone CPUs and GPUs in parallel.