ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Prefetching Using Markov Predictors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Data remapping for design space optimization of embedded memory systems
ACM Transactions on Embedded Computing Systems (TECS)
Improving Cache Behavior of Dynamically Allocated Data Structures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Compiler orchestrated prefetching via speculation and predication
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Efficient, transparent, and comprehensive runtime code manipulation
Efficient, transparent, and comprehensive runtime code manipulation
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
Performance driven data cache prefetching in a dynamic software optimization system
Proceedings of the 21st annual international conference on Supercomputing
Memory management thread for heap allocation intensive sequential applications
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
On improving heap memory layout by dynamic pool allocation
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Cling: A memory allocator to mitigate dangling pointers
USENIX Security'10 Proceedings of the 19th USENIX conference on Security
On-the-fly structure splitting for heap objects
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hi-index | 0.00 |
Heap memory allocation plays an important role in modern applications. Conventional heap allocators, however, generally ignore the underlying memory hierarchy of the system, favoring instead a low runtime overhead and fast response times. Unfortunately, with little concern for the memory hierarchy, the data layout may exhibit poor spatial locality, and degrade cache performance. In this paper, we describe a dynamic heap allocation scheme called pool allocation. The strategy aims to improve cache performance by inspecting memory allocation requests, and allocating memory from appropriate heap pools as dictated by the requesting context. The advantages are two fold. First, by pooling together data with a common context, we expect to improve spatial locality, as data fetched to the caches will contain fewer items from different contexts. If the allocation patterns are closely matched to the traversal patterns, the end result is faster memory performance. Second, by pooling heap objects, we expect access patterns to exhibit more regularity, thus creating more opportunities for data prefetching. Our dynamic memory optimizer exploits the increased regularity to insert prefetch instructions at runtime. The optimizations are implemented in DynamoRIO, a dynamic optimization framework. We evaluate the work using various bench-marks, and measure a 17% speedup over gcc −03 on an Athlon MP, and a 13% speedup on a Pentium 4.