Cache-conscious coallocation of hot data streams

Authors:
Trishul M. Chilimbi;Ran Shaham
Affiliations:
One Microsoft Way, Redmond, WA;Amuse Toy & Game Development, Givat Shmuel, Israel
Venue:
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Year:
2006

Citing 17
Cited 11

Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Segregating heap objects by reference behavior and lifetime

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Automated data-member layout of heap objects to improve memory-hierarchy performance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Composing high-performance memory allocators

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient representations and abstractions for quantifying and exploiting data reference locality

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The hardness of cache conscious data placement

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient profile-analysis framework for data-layout optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
On the Stability of Temporal Data Reference Profiles

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Improving Cache Behavior of Dynamically Allocated Data Structures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
The garbage collection advantage: improving program locality

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Finding your cronies: static analysis for dynamic object colocation

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Automatic pool allocation: improving performance by controlling data structure layout in the heap

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Approximations of weighted independent set and hereditary subset problems

COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics

Structure Layout Optimization for Multithreaded Programs

Proceedings of the International Symposium on Code Generation and Optimization
AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Efficient runtime tracking of allocation sites in Java

Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
On improving heap memory layout by dynamic pool allocation

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

Proceedings of the 24th ACM International Conference on Supercomputing
AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applications

ACM Transactions on the Web (TWEB)
Fast and compact hash tables for integer keys

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Engineering scalable, cache and space efficient tries for strings

The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
On-the-fly elimination of dynamic irregularities for GPU computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The memory system performance of many programs can be improved by coallocating contemporaneously accessed heap objects in the same cache block. We present a novel profile-based analysis for producing such a layout. The analysis achieves cacheconscious coallocation of a hot data stream H (i.e., a regular data access pattern that frequently repeats) by isolating and combining allocation sites of object instances that appear in H such that intervening allocations coming from other sites are separated. The coallocation solution produced by the analysis is enforced by an automatic tool, cminstr, that redirects a program's heap allocations to a run-time coallocation library comalloc. We also extend the analysis to coallocation at object field granularity. The resulting field coallocation solution generalizes common data restructuring techniques, such as field reordering, object splitting, and object merging, and allows their combination. Furthermore, it provides insight into object restructuring by breaking down the coallocation benefit on a per-technique basis, which provides the opportunity to pick the "sweet spot" for each program. Experimental results using a set of memory-performance-limited benchmarks, including a few SPECInt2000 programs, and Microsoft VisualFoxPro, indicate that programs possess significant coallocation opportunities. Automatic object coallocation improves execution time by 13% on average in the presence of hardware prefetching. Hand-implemented field coallocation solutions for two of the benchmarks produced additional improvements (12% and 22%) but the effort involved suggests implementing an automated version for type-safe languages, such as Java and C#.