A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Scheduling and page migration for multiprocessor compute servers
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system
Proceedings of the 19th annual international conference on Supercomputing
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Enhancing operating system support for multicore processors by using hardware performance monitoring
ACM SIGOPS Operating Systems Review
Dynamic data migration for structured AMR solvers
International Journal of Parallel Programming
Geographical locality and dynamic data migration for OpenMP implementations of adaptive PDE solvers
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
A case for NUMA-aware contention management on multicore systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
FELI: HW/SW support for on-chip distributed shared memory in multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Nonuniform memory affinity strategy in multithreaded sparse matrix computations
Proceedings of the 2012 Symposium on High Performance Computing
ADAPT: A framework for coscheduling multithreaded programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Current shared-memory multiprocessor CC-NUMA architectures provide a global address space to applications by hardware. However, even though the memory is virtually shared, it is actually physically distributed. Since memory nodes are distributed across the system, the cost of the memory accesses depends on the distance between the node that accesses the data and the node that physically contains the data. To reduce the impact of a bad initial memory placement, some operating systems offer a dynamic memory migration mechanism.In this paper, we want to demonstrate that memory migration mechanisms are a useful approach, but that their performance depends more on related issues, such as the processor scheduling, than on the mechanism itself. To show that, we evaluate the case of the automatic memory migration mechanism provided by IRIX, in Origin systems.We have evaluated several workloads of OpenMP applications under different system conditions such as the processor scheduling policy or the system load. In particular, we have focused on the effects of the page migration mechanism on the CPU time consumed by each application, the processor allocation received, and the speedup, when applying performance-driven scheduling policies.Results show that, if the scheduler is memory conscious, that is, it maintains as much as possible the system stable, the automatic memory page migration mechanism provided by IRIX will improve the execution time of OpenMPapplications. Experiments also show that the combination of performance-driven policies and the memory migration mechanism results in a system that can be automatically self-evaluated and self-configured.