Scalable SIMD-parallel memory allocation for many-core machines

  • Authors:
  • Xiaohuang Huang;Christopher I. Rodrigues;Stephen Jones;Ian Buck;Wen-Mei Hwu

  • Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, USA 61801;University of Illinois at Urbana-Champaign, Urbana, USA 61801;NVIDIA Corporation, Santa Clara, USA 95050;NVIDIA Corporation, Santa Clara, USA 95050;University of Illinois at Urbana-Champaign, Urbana, USA 61801

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dynamic memory allocation is an important feature of modern programming systems. However, the cost of memory allocation in massively parallel execution environments such as CUDA has been too high for many types of kernels. This paper presents XMalloc, a high-throughput memory allocation mechanism that dramatically magnifies the allocation throughput of an underlying memory allocator. XMalloc embodies two key techniques: allocation coalescing and buffering using efficient queues. This paper describes these two techniques and presents our implementation of XMalloc as a memory allocator library. The library is designed to be called from kernels executed by massive numbers of threads. Our experimental results based on the NVIDIA G480 GPU show that XMalloc magnifies the allocation throughput of the underlying memory allocator by a factor of 48.