Hoard: a scalable memory allocator for multithreaded applications
ACM SIGPLAN Notices
Proceedings of the 3rd international symposium on Memory management
Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
Scalable lock-free dynamic memory allocation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Lock-free deques and doubly linked lists
Journal of Parallel and Distributed Computing
A view of the parallel computing landscape
Communications of the ACM - A View of Parallel Computing
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Regular Expression Matching on Graphics Hardware for Intrusion Detection
RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines
CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
A Comprehensive Performance Comparison of CUDA and OpenCL
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Performance Gaps between OpenMP and OpenCL for Multi-core CPUs
ICPPW '12 Proceedings of the 2012 41st International Conference on Parallel Processing Workshops
Fast dynamic memory allocator for massively parallel architectures
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Hi-index | 0.00 |
OpenCL is becoming a popular choice for the parallel programming of both multi-core CPUs and GPGPUs. One of the features missing in OpenCL, yet commonly required in irregular parallel applications, is dynamic memory allocation. In this paper, we propose KMA, a first dynamic memory allocator for OpenCL. KMA's design is based on a thorough analysis of a set of 11 algorithms, which shows that dynamic memory allocation is a necessary commodity, typically used for implementing complex data structures (arrays, lists, trees) that need constant restructuring at runtime. Taking into account both the survey findings and the status-quo of OpenCL, we design KMA as a two-layer memory manager that makes smart use of the patterns we identified in our application analysis: its basic functionality provides generic malloc() and free() APIs, while the higher layer provides support for building and efficiently managing dynamic data structures. Our experiments measure the performance and usability of KMA, using both microbenchmarks and a real-life case-study. Results show that when dynamic allocation is mandatory, KMA is a competitive allocator. We conclude that embedding dynamic memory allocation in OpenCL is feasible, but it is a complex, delicate task due to the massive parallelism of the platform and the portability issues between different OpenCL implementations.