MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving locality of reference in a garbage-collecting memory management system
Communications of the ACM
Effective “static-graph” reorganization to improve locality in garbage-collected systems
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Garbage collection: algorithms for automatic dynamic memory management
Garbage collection: algorithms for automatic dynamic memory management
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
List processing in real time on a serial computer
Communications of the ACM
A nonrecursive list compacting algorithm
Communications of the ACM
Solution of a problem in concurrent programming control
Communications of the ACM
A parallel, real-time garbage collector
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
A scalable mark-sweep garbage collector on large-scale shared-memory machines
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Understanding the connectivity of heap objects
Proceedings of the 3rd international symposium on Memory management
A parallel, incremental and concurrent GC for servers
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Creating and preserving locality of java applications at allocation and garbage collection times
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Evaluation of Parallel Copying Garbage Collection on a Shared-Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Data Flow Analysis for Software Prefetching Linked Data Structures in Java
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Continuous program optimization: A case study
ACM Transactions on Programming Languages and Systems (TOPLAS)
Stride prefetching by dynamically inspecting objects
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Garbage collection in a large LISP system
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The garbage collection advantage: improving program locality
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Automatic pool allocation: improving performance by controlling data structure layout in the heap
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Garbage collection without paging
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Parallel garbage collection for shared memory multiprocessors
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
A comparative evaluation of parallel garbage collector implementations
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Data layouts for object-oriented programs
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Statistically rigorous java performance evaluation
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Parallel generational-copying garbage collection with a block-structured heap
Proceedings of the 7th international symposium on Memory management
Online Phase-Adaptive Data Layout Selection
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
A new approach to parallelising tracing algorithms
Proceedings of the 2009 international symposium on Memory management
Placement optimization using data context collected during garbage collection
Proceedings of the 2009 international symposium on Memory management
The locality of concurrent write barriers
Proceedings of the 2010 international symposium on Memory management
Exploitation of multicore systems in a java virtual machine
IBM Journal of Research and Development
Parallel memory defragmentation on a GPU
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Identifying the sources of cache misses in Java programs without relying on hardware counters
Proceedings of the 2012 international symposium on Memory Management
Hi-index | 0.00 |
This paper shows how to reduce cache and TLB misses by changing the order in which a parallel garbage collector copies heap objects. Reducing cache and TLB misses improves program run time. Parallel garbage collection improves scaling on multi-processor machines. Technology trends indicate that both memory locality and multi-processor scaling increase in importance. Our new algorithmis based on the earlier single-threaded "hierarchical decomposi-tion" algorithm by Wilson, Lam, and Moher. This paper presents a thorough evaluation of parallel hierarchical copying, showing that it improves spatial locality, reduces cache and TLB misses, and speeds up 14 out of 26 benchmarks.