Why nothing matters: the impact of zeroing

Authors:
Xi Yang;Stephen M. Blackburn;Daniel Frampton;Jennifer B. Sartor;Kathryn S. McKinley
Affiliations:
Australian National University, Canberra, Australia;Australian National University, Canberra, Australia;Australian National University, Canberra, Australia;EPFL, Lausanne, Switzerland;Microsoft Research and University of Texas at Austin, Austin, USA
Venue:
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Year:
2011

Citing 21
Cited 15

Cache write policies and performance

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The PowerPC architecture: a specification for a new family of RISC processors

The PowerPC architecture: a specification for a new family of RISC processors
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Oil and Water? High Performance Garbage Collection in Java with MMTk

Proceedings of the 26th International Conference on Software Engineering
Myths and realities: the performance impact of garbage collection

Proceedings of the joint international conference on Measurement and modeling of computer systems
Garbage-first garbage collection

Proceedings of the 4th international symposium on Memory management
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Exterminator: automatically correcting memory errors with high probability

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
JavaTM just-in-time compiler and virtual machine improvements for server and middleware applications

VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Wake up and smell the coffee: evaluation methodology for the 21st century

Communications of the ACM - Designing games with a purpose
A study of memory management for web-based applications on multicore processors

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Power7: IBM's Next-Generation Server Processor

IEEE Micro
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor

IEEE Micro
Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms

Proceedings of the 47th Design Automation Conference
The future of microprocessors

Communications of the ACM

new Scala() instance of Java: a comparison of the memory behaviour of Java and Scala programs

Proceedings of the 2012 international symposium on Memory Management
A generalized theory of collaborative caching

Proceedings of the 2012 international symposium on Memory Management
The yin and yang of power and performance for asymmetric hardware and managed software

Proceedings of the 39th Annual International Symposium on Computer Architecture
Object initialization in x10

ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Exploring multi-threaded Java application performance on multicore hardware

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Work-stealing without the baggage

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A black-box approach to understanding concurrency in DaCapo

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A comprehensive toolchain for workload characterization across JVM languages

Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering
Using managed runtime systems to tolerate holes in wearable memories

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Pacman: program-assisted cache management

Proceedings of the 2013 international symposium on memory management
Cache rationing for multicore

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
OCTET: capturing and controlling cross-thread dependences efficiently

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Taking off the gloves with reference counting Immix

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Characteristics of dynamic JVM languages

Proceedings of the 7th ACM workshop on Virtual machines and intermediate languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory safety defends against inadvertent and malicious misuse of memory that may compromise program correctness and security. A critical element of memory safety is zero initialization. The direct cost of zero initialization is surprisingly high: up to 12.7%, with average costs ranging from 2.7 to 4.5% on a high performance virtual machine on IA32 architectures. Zero initialization also incurs indirect costs due to its memory bandwidth demands and cache displacement effects. Existing virtual machines either: a) minimize direct costs by zeroing in large blocks, or b) minimize indirect costs by zeroing in the allocation sequence, which reduces cache displacement and bandwidth. This paper evaluates the two widely used zero initialization designs, showing that they make different tradeoffs to achieve very similar performance. Our analysis inspires three better designs: (1) bulk zeroing with cache-bypassing (non-temporal) instructions to reduce the direct and indirect zeroing costs simultaneously, (2) concurrent non-temporal bulk zeroing that exploits parallel hardware to move work off the application's critical path, and (3) adaptive zeroing, which dynamically chooses between (1) and (2) based on available hardware parallelism. The new software strategies offer speedups sometimes greater than the direct overhead, improving total performance by 3% on average. Our findings invite additional optimizations and microarchitectural support.