Impact of sharing-based thread placement on multithreaded architectures
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
NUMA-Aware Java Heaps for Server Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Hardware monitors for dynamic page migration
Journal of Parallel and Distributed Computing
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Analytical Modeling of Pipeline Parallelism
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
NUMA-aware memory manager with dominant-thread-based copying GC
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Feedback-directed pipeline parallelism
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Feedback-directed page placement for ccNUMA via hardware-generated memory traces
Journal of Parallel and Distributed Computing
Memory system performance in a NUMA multicore multiprocessor
Proceedings of the 4th Annual International Conference on Systems and Storage
Proceedings of the international symposium on Memory management
A case for NUMA-aware contention management on multicore systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Matching memory access patterns and data placement for NUMA systems
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Improving execution unit occupancy on SMT-based processors through hardware-aware thread scheduling
Future Generation Computer Systems
Hi-index | 0.00 |
Many multicore multiprocessors have a non-uniform memory architecture (NUMA), and for good performance, data and computations must be partitioned so that (ideally) all threads execute on the processor that holds their data. However, many multithreaded applications show heavy use of shared data structures that are accessed by all threads of the application. Automatic data placement and thread scheduling for these applications is (still) difficult. We present a template library for shared data structures that allows a programmer to express both the data layout (how the data space is partitioned) as well as thread mapping and scheduling (when and where a thread is executed). The template library supports programmers in dividing computations and data for reducing the percentage of costly remote memory accesses in NUMA multicore multiprocessors. Initial experience with ferret, a program with irregular memory access patterns from the PARSEC benchmark suite, shows that this approach can reduce the number of remote accesses from 42% to 10% and results in a performance improvement of 3% without overwhelming the programmer.