Avoiding initialization misses to the heap

Authors:
Jarrod A. Lewis;Bryan Black;Mikko H. Lipasti
Affiliations:
University of Wisconsin-Madison;Intel Corporation;University of Wisconsin-Madison
Venue:
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Year:
2002

Citing 17
Cited 1

Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Improving the cache locality of memory allocation

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Using lifetime predictors to improve memory allocation performance

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The detection and elimination of useless misses in multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache write policies and performance

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Limitations of cache prefetching on a bus-based multiprocessor

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory system performance of programs with intensive heap allocation

ACM Transactions on Computer Systems (TOCS)
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Using the SimOS machine simulator to study complex computer systems

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Memory systems and pipelined processors

Memory systems and pipelined processors
Segregating heap objects by reference behavior and lifetime

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Complete Computer System Simulation: The SimOS Approach

IEEE Parallel & Distributed Technology: Systems & Technology

ESKIMO: Energy savings using Semantic Knowledge of Inconsequential Memory Occupancy for DRAM subsystem

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates a class of main memory accesses (invalid memory traffic) that can be eliminated altogether. Invalid memory traffic is real data traffic that transfers invalid data. By tracking the initialization of dynamic memory allocations, it is possible to identify store instructions that miss the cache and would fetch uninitialized heap data. The data transfers associated with these initialization misses can be avoided without losing correctness. The memory system property crucial for achieving good performance under heap allocation is cache installation - the ability to allocate and initialize a new object into the cache without a penalty. Tracking heap initialization at a cache block granularity enables cache installation mechanisms to provide zero-latency prefetching into the cache. We propose a hardware mechanism, the Allocation Range Cache, that can efficiently identify initializing store misses to the heap and trigger cache installations to avoid invalid memory traffic.Results: For a 2MB cache 23% of cache misses (35% of compulsory misses) to memory are initializing the heap in the SPEC CINT2000 benchmarks. By using a simple base-bounds range sweeping scheme to track the initialization of the 64 most recent dynamic memory allocations, nearly 100% of all initializing store misses can be identified and installed in cache without accessing memory. Smashing invalid memory traffic via cache installation at a cache block granularity removes 23% of all miss traffic and can provide up to 41% performance improvement.