A cluster computer performance predictor for memory scheduling

Authors:
Mónica Serrano;Julio Sahuquillo;Houcine Hassan;Salvador Petit;José Duato
Affiliations:
Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain
Venue:
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Year:
2011

Citing 9
Cited 0

The AMD Opteron Processor for Multiprocessor Servers

IEEE Micro
Design and implementation of a one-sided communication interface for the IBM eServer Blue Gene® supercomputer

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A Remote Memory Swapping System for Cluster Computers

PDCAT '07 Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies
The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Proceedings of the 22nd annual international conference on Supercomputing
A Resource Optimized Remote-Memory-Access Architecture for Low-latency Communication

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A Scheduling Heuristic to Handle Local and Remote Memory in Cluster Computers

HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
Performance Prediction on Multi-core Processors

CICN '10 Proceedings of the 2010 International Conference on Computational Intelligence and Communication Networks
A cost-effective heuristic to schedule local and remote memory in cluster computers

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Remote Memory Access (RMA) hardware allow a given motherboard in a cluster to directly access the memory installed in a remote motherboard of the same cluster. In recent works, this characteristic has been used to extend the addressable memory space of selected motherboards, which enable a better balance of main memory resources among cluster applications. This way is much more cost-effective than than implementing a full-fledged shared memory system. In this context, the memory scheduler is in charge of finding a suitable distribution of local and remote memory that maximizes the performance and guarantees a minimum QoS among the applications. Note that since changing the memory distribution is a slow process involving several motherboards, the memory scheduler needs to make sure that the target distribution provides better performance than the current one. In this paper, a performance predictor is designed in order to find the best memory distribution for a given set of applications executing in a cluster motherboard. The predictor uses simple hardware counters to estimate the expected impact on performance of the different memory distributions. The hardware counters provide the predictor with the information about the time spent in processor, memory access and network. The performance model used by the predictor has been validated in a detailed microarchitectural simulator using real benchmarks. Results show that the prediction accuracy never deviates more than 5% compared to the real results, being less than 0.5% in most of the cases.