Thin locks: featherweight synchronization for Java
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A real-time garbage collector based on the lifetimes of objects
Communications of the ACM
Thread-specific heaps for multi-threaded programs
Proceedings of the 2nd international symposium on Memory management
Exploiting prolific types for memory management and optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 3rd international symposium on Memory management
Lock reservation: Java locks can mostly do without atomic operations
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Generation Scavenging: A non-disruptive high performance storage reclamation algorithm
SDE 1 Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments
Java server performance: a case study of building efficient, scalable Jvms
IBM Systems Journal
TO-Lock: Removing Lock Overhead Using the Owners' Temporal Locality
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Using Hardware Counters to Automatically Improve Memory Performance
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
NUMA-Aware Java Heaps for Server Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Hardware profile-guided automatic page placement for ccNUMA systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Clustering the heap in multi-threaded applications for improved garbage collection
Proceedings of the 8th annual conference on Genetic and evolutionary computation
A two-phase escape analysis for parallel java programs
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Practical escape analyses: how good are they?
Proceedings of the 3rd international conference on Virtual execution environments
Data layouts for object-oriented programs
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
JavaTM just-in-time compiler and virtual machine improvements for server and middleware applications
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
IBM Journal of Research and Development
Memory system performance in a NUMA multicore multiprocessor
Proceedings of the 4th Annual International Conference on Systems and Storage
Proceedings of the international symposium on Memory management
Assessing the scalability of garbage collectors on many cores
PLOS '11 Proceedings of the 6th Workshop on Programming Languages and Operating Systems
Assessing the scalability of garbage collectors on many cores
ACM SIGOPS Operating Systems Review
Parallel memory defragmentation on a GPU
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Scalable concurrent and parallel mark
Proceedings of the 2012 international symposium on Memory Management
A template library to integrate thread scheduling and locality management for NUMA multiprocessors
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
A study of the scalability of stop-the-world garbage collectors on multicores
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
We propose a novel online method of identifying the preferred NUMA nodes for objects with negligible overhead during the garbage collection time as well as object allocation time. Since the number of CPUs (or NUMA nodes) is increasing recently, it is critical for the memory manager of the runtime environment of an object-oriented language to exploit the low latency of local memory for high performance. To locate the CPU of a thread that frequently accesses an object, prior research uses the runtime information about memory accesses as sampled by the hardware. However, the overhead of this approach is high for a garbage collector. Our approach uses the information about which thread can exclusively access an object, or the Dominant Thread (DoT). The dominant thread of an object is the thread that often most accesses an object so that we do not require memory access samples. Our NUMA-aware GC performs DoT based object copying, which copies each live object to the CPU where the dominant thread was last dispatched before GC. The dominant thread information is known from the thread stack and from objects that are locked or reserved by threads and is propagated in the object reference graph. We demonstrate that our approach can improve the performance of benchmark programs such as SPECpower ssj2008, SPECjbb2005, and SPECjvm2008.We prototyped a NUMAaware memory manager on a modified version of IBM Java VM and tested it on a cc-NUMA POWER6 machine with eight NUMA nodes. Our NUMA-aware GC achieved performance improvements up to 14.3% and 2.0% on average over a JVM that only used the NUMA-aware allocator. The total improvement using both the NUMA-aware allocator and GC is up to 53.1% and 10.8% on average.