The robustness of NUMA memory management
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Scheduling and page migration for multiprocessor compute servers
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Performance experiences on Sun's Wildfire prototype
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Thread-specific heaps for multi-threaded programs
Proceedings of the 2nd international symposium on Memory management
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Proceedings of the 3rd international symposium on Memory management
The sun fireplane system interconnect
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Dynamic page placement to improve locality in CC-NUMA multiprocessors for TPC-C
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
SMP system interconnect instrumentation for performance analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Memory System Behavior of Java-Based Middleware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Using Hardware Counters to Automatically Improve Memory Performance
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
Placement optimization using data context collected during garbage collection
Proceedings of the 2009 international symposium on Memory management
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
NUMA-aware memory manager with dominant-thread-based copying GC
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Memory system performance in a NUMA multicore multiprocessor
Proceedings of the 4th Annual International Conference on Systems and Storage
Assessing the scalability of garbage collectors on many cores
PLOS '11 Proceedings of the 6th Workshop on Programming Languages and Operating Systems
Assessing the scalability of garbage collectors on many cores
ACM SIGOPS Operating Systems Review
Scalable concurrent and parallel mark
Proceedings of the 2012 international symposium on Memory Management
A template library to integrate thread scheduling and locality management for NUMA multiprocessors
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
A study of the scalability of stop-the-world garbage collectors on multicores
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
We introduce a set of techniques to both measure and optimize memory access locality of Java applications running on cc-NUMA servers.These techniques work at the object level and use information gathered from embedded hardware performance monitors.We propose a new NUMA-aware Java heap layout.In addition, we propose using dynamic object migration during garbage collection to move objects local to the processors accessing them most.Our optimization technique reduced the number of non-local memory accesses in Java workloads generated from actual runs of the SPECjbb2000 benchmark by up to 41%, and also resulted in 40% reduction in workload execution time.