Cache index-aware memory allocation

Authors:
Yehuda Afek;Dave Dice;Adam Morrison
Affiliations:
Tel Aviv University, Tel Aviv, Israel;Oracle Labs, Burlington, MA, USA;Tel Aviv University, Tel Aviv, Israel
Venue:
Proceedings of the international symposium on Memory management
Year:
2011

Citing 27
Cited 0

Self-adjusting binary search trees

Journal of the ACM (JACM)
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
A case for two-way skewed-associative caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Eliminating cache conflict misses through XOR-based placement functions

ICS '97 Proceedings of the 11th international conference on Supercomputing
The memory fragmentation problem: solved?

Proceedings of the 1st international symposium on Memory management
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Automated data-member layout of heap objects to improve memory-hierarchy performance

ACM Transactions on Programming Languages and Systems (TOPLAS)
A fast storage allocator

Communications of the ACM
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Benchmark health considered harmful

ACM SIGARCH Computer Architecture News
The hardness of cache conscious data placement

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

IEEE Transactions on Computers
Mostly lock-free malloc

Proceedings of the 3rd international symposium on Memory management
Cache Profiling and the SPEC Benchmarks: A Case Study

Computer
Making Pointer-Based Data Structures Cache Conscious

Computer
Scalable lock-free dynamic memory allocation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
Automatic pool allocation: improving performance by controlling data structure layout in the heap

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Architecture-conscious hashing

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
The slab allocator: an object-caching kernel memory allocator

USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Archipelago: trading address space for reliability and security

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A novel cache architecture with enhanced performance and security

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the resource-sharing levels in the UltraSPARC T2 processor

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Simplifying concurrent algorithms by exploiting hardware transactional memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
The ZCache: Decoupling Ways and Associativity

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Poor placement of data blocks in memory may negatively impact application performance because of an increase in the cache conflict miss rate [18]. For dynamically allocated structures this placement is typically determined by the memory allocator. Cache index-oblivious allocators may inadvertently place blocks on a restricted fraction of the available cache indexes, artificially and needlessly increasing the conflict miss rate. While some allocators are less vulnerable to this phenomena, no general-purpose malloc allocator is index-aware and methodologically addresses this concern. We demonstrate that many existing state-of-the-art allocators are index-oblivious, admitting performance pathologies for certain block sizes. We show that a simple adjustment within the allocator to control the spacing of blocks can provide better index coverage, which in turn reduces the superfluous conflict miss rate in various applications, improving performance with no observed negative consequences. The result is an index-aware allocator. Our technique is general and can easily be applied to most memory allocators and to various processor architectures. Furthermore, we can reduce inter-thread and inter-process conflict misses for processors where threads concurrently share the level-1 cache such as the Sun UltraSPARC-T2™ and Intel "Nehalem" by coloring the placement of blocks so that allocations for different threads and processes start on different cache indexes.