Cache write policies and performance
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The PowerPC architecture: a specification for a new family of RISC processors
The PowerPC architecture: a specification for a new family of RISC processors
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Oil and Water? High Performance Garbage Collection in Java with MMTk
Proceedings of the 26th International Conference on Software Engineering
Myths and realities: the performance impact of garbage collection
Proceedings of the joint international conference on Measurement and modeling of computer systems
Garbage-first garbage collection
Proceedings of the 4th international symposium on Memory management
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
The DaCapo benchmarks: java benchmarking development and analysis
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Exterminator: automatically correcting memory errors with high probability
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
JavaTM just-in-time compiler and virtual machine improvements for server and middleware applications
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Wake up and smell the coffee: evaluation methodology for the 21st century
Communications of the ACM - Designing games with a purpose
A study of memory management for web-based applications on multicore processors
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Scaling the bandwidth wall: challenges in and avenues for CMP scaling
Proceedings of the 36th annual international symposium on Computer architecture
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms
Proceedings of the 47th Design Automation Conference
Communications of the ACM
new Scala() instance of Java: a comparison of the memory behaviour of Java and Scala programs
Proceedings of the 2012 international symposium on Memory Management
A generalized theory of collaborative caching
Proceedings of the 2012 international symposium on Memory Management
The yin and yang of power and performance for asymmetric hardware and managed software
Proceedings of the 39th Annual International Symposium on Computer Architecture
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Exploring multi-threaded Java application performance on multicore hardware
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Work-stealing without the baggage
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A black-box approach to understanding concurrency in DaCapo
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A comprehensive toolchain for workload characterization across JVM languages
Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering
Using managed runtime systems to tolerate holes in wearable memories
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Pacman: program-assisted cache management
Proceedings of the 2013 international symposium on memory management
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
OCTET: capturing and controlling cross-thread dependences efficiently
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Taking off the gloves with reference counting Immix
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Characteristics of dynamic JVM languages
Proceedings of the 7th ACM workshop on Virtual machines and intermediate languages
Hi-index | 0.00 |
Memory safety defends against inadvertent and malicious misuse of memory that may compromise program correctness and security. A critical element of memory safety is zero initialization. The direct cost of zero initialization is surprisingly high: up to 12.7%, with average costs ranging from 2.7 to 4.5% on a high performance virtual machine on IA32 architectures. Zero initialization also incurs indirect costs due to its memory bandwidth demands and cache displacement effects. Existing virtual machines either: a) minimize direct costs by zeroing in large blocks, or b) minimize indirect costs by zeroing in the allocation sequence, which reduces cache displacement and bandwidth. This paper evaluates the two widely used zero initialization designs, showing that they make different tradeoffs to achieve very similar performance. Our analysis inspires three better designs: (1) bulk zeroing with cache-bypassing (non-temporal) instructions to reduce the direct and indirect zeroing costs simultaneously, (2) concurrent non-temporal bulk zeroing that exploits parallel hardware to move work off the application's critical path, and (3) adaptive zeroing, which dynamically chooses between (1) and (2) based on available hardware parallelism. The new software strategies offer speedups sometimes greater than the direct overhead, improving total performance by 3% on average. Our findings invite additional optimizations and microarchitectural support.