Evaluation of the memory page migration influence in the system performance: the case of the SGI O2000

Authors:
Julita Corbalan;Xavier Martorell;Jesus Labarta
Affiliations:
DAC-UPC. Barcelona, Spain;DAC-UPC. Barcelona, Spain;DAC-UPC. Barcelona, Spain
Venue:
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Year:
2003

Citing 6
Cited 9

A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Scheduling and page migration for multiprocessor compute servers

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing

affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system

Proceedings of the 19th annual international conference on Supercomputing
Integrating Dynamic Memory Placement with Adaptive Load-Balancing for Parallel Codes on NUMA Multiprocessors

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Enhancing operating system support for multicore processors by using hardware performance monitoring

ACM SIGOPS Operating Systems Review
Dynamic data migration for structured AMR solvers

International Journal of Parallel Programming
Geographical locality and dynamic data migration for OpenMP implementations of adaptive PDE solvers

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
A case for NUMA-aware contention management on multicore systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
FELI: HW/SW support for on-chip distributed shared memory in multicores

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Nonuniform memory affinity strategy in multithreaded sparse matrix computations

Proceedings of the 2012 Symposium on High Performance Computing
ADAPT: A framework for coscheduling multithreaded programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current shared-memory multiprocessor CC-NUMA architectures provide a global address space to applications by hardware. However, even though the memory is virtually shared, it is actually physically distributed. Since memory nodes are distributed across the system, the cost of the memory accesses depends on the distance between the node that accesses the data and the node that physically contains the data. To reduce the impact of a bad initial memory placement, some operating systems offer a dynamic memory migration mechanism.In this paper, we want to demonstrate that memory migration mechanisms are a useful approach, but that their performance depends more on related issues, such as the processor scheduling, than on the mechanism itself. To show that, we evaluate the case of the automatic memory migration mechanism provided by IRIX, in Origin systems.We have evaluated several workloads of OpenMP applications under different system conditions such as the processor scheduling policy or the system load. In particular, we have focused on the effects of the page migration mechanism on the CPU time consumed by each application, the processor allocation received, and the speedup, when applying performance-driven scheduling policies.Results show that, if the scheduler is memory conscious, that is, it maintains as much as possible the system stable, the automatic memory page migration mechanism provided by IRIX will improve the execution time of OpenMPapplications. Experiments also show that the combination of performance-driven policies and the memory migration mechanism results in a system that can be automatically self-evaluated and self-configured.