Memory management thread for heap allocation intensive sequential applications

Authors:
Devesh Tiwari;Sanghoon Lee;James Tuck;Yan Solihin
Affiliations:
North Carolina State University, Raleigh;North Carolina State University, Raleigh;North Carolina State University, Raleigh;North Carolina State University, Raleigh
Venue:
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Year:
2009

Citing 31
Cited 2

Fast allocation and deallocation of memory based on object lifetimes

Software—Practice & Experience
Empirical measurements of six allocation-intensive C programs

ACM SIGPLAN Notices
Improving the cache locality of memory allocation

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Memory allocation costs in large C and C++ programs

Software—Practice & Experience
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory management with explicit regions

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Efficient C++: performance programming techniques

Efficient C++: performance programming techniques
Hoard: a scalable memory allocator for multithreaded applications

ACM SIGPLAN Notices
Slipstream processors: improving both performance and fault tolerance

ACM SIGPLAN Notices
C++ in action: industrial-strength programming techniques

ACM SIGSOFT Software Engineering Notes
Reconsidering custom memory allocation

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A High-Performance Memory Allocator for Object-Oriented Systems

IEEE Transactions on Computers
Dynamic Storage Allocation: A Survey and Critical Review

IWMM '95 Proceedings of the International Workshop on Memory Management
Are Mallocs Free of Fragmentation?

Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference
Improving Cache Behavior of Dynamically Allocated Data Structures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Architectural Support for Dynamic Memory Management

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Automatic pool allocation: improving performance by controlling data structure layout in the heap

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A locality-improving dynamic memory allocator

Proceedings of the 2005 workshop on Memory system performance
"MAMA!": a memory allocator for multithreaded architectures

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic memory optimization using pool allocation and prefetching

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Scalable locality-conscious multithreaded memory allocation

Proceedings of the 5th international symposium on Memory management
HeapMon: a helper-thread approach to programmable, automatic, and low-overhead memory bug detection

IBM Journal of Research and Development
Comprehensively and efficiently protecting the heap

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A Page-based Hybrid (Software-Hardware) Dynamic Memory Allocator

IEEE Computer Architecture Letters
Malloc(3) revisited

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Runahead Execution: An Effective Alternative to Large Instruction Windows

IEEE Micro
Helper thread prefetching for loosely-coupled multiprocessor systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing c multithreaded memory management using thread-local storage

CC'05 Proceedings of the 14th international conference on Compiler Construction

Optimal resource management for a model driven LTE protocol stack on a multicore platform

Proceedings of the 8th ACM international workshop on Mobility management and wireless access
Regional cache organization for NoC based many-core processors

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic memory management is one of the most ubiquitous and expensive operations in many C/C++ applications. Some C/C++ programs might spend up to one third of their execution time in dynamic memory management routines. With multicore processors as a mainstream architecture, it is important to investigate how dynamic memory management can exploit the multi-core parallelism for speeding up sequential programs. In this paper, we propose a way for exploiting multicore parallelism in dynamic memory management for sequential applications, by spinning off memory allocation and deallocation functions to a separate thread that we refer to as memory management thread (MMT). The goal of this study is to show how an efficient design and implementation of MMT can give performance without any algorithm or implementation level knowledge of underlying memory management library being offloaded. Using heap allocation-intensive benchmarks, we evaluate MMT on an Intel Core 2 Quad platform for widely used Doug Lea's memory allocator. Without any modifications to application source-code or memory management algorithm of underlying memory allocators, our MMT approach achieves an average speedup ratio of 1.19x, and 1.60x in the best case.