Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A Remote Memory Swapping System for Cluster Computers
PDCAT '07 Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies
Proceedings of the 22nd annual international conference on Supercomputing
A Resource Optimized Remote-Memory-Access Architecture for Low-latency Communication
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A Scheduling Heuristic to Handle Local and Remote Memory in Cluster Computers
HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
Performance Prediction on Multi-core Processors
CICN '10 Proceedings of the 2010 International Conference on Computational Intelligence and Communication Networks
A cost-effective heuristic to schedule local and remote memory in cluster computers
The Journal of Supercomputing
Hi-index | 0.00 |
Remote Memory Access (RMA) hardware allow a given motherboard in a cluster to directly access the memory installed in a remote motherboard of the same cluster. In recent works, this characteristic has been used to extend the addressable memory space of selected motherboards, which enable a better balance of main memory resources among cluster applications. This way is much more cost-effective than than implementing a full-fledged shared memory system. In this context, the memory scheduler is in charge of finding a suitable distribution of local and remote memory that maximizes the performance and guarantees a minimum QoS among the applications. Note that since changing the memory distribution is a slow process involving several motherboards, the memory scheduler needs to make sure that the target distribution provides better performance than the current one. In this paper, a performance predictor is designed in order to find the best memory distribution for a given set of applications executing in a cluster motherboard. The predictor uses simple hardware counters to estimate the expected impact on performance of the different memory distributions. The hardware counters provide the predictor with the information about the time spent in processor, memory access and network. The performance model used by the predictor has been validated in a detailed microarchitectural simulator using real benchmarks. Results show that the prediction accuracy never deviates more than 5% compared to the real results, being less than 0.5% in most of the cases.