A cluster computer performance predictor for memory scheduling

  • Authors:
  • Mónica Serrano;Julio Sahuquillo;Houcine Hassan;Salvador Petit;José Duato

  • Affiliations:
  • Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain;Department of Computer Engineering, Universidad Politécnica de Valencia, Valencia, Spain

  • Venue:
  • ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Remote Memory Access (RMA) hardware allow a given motherboard in a cluster to directly access the memory installed in a remote motherboard of the same cluster. In recent works, this characteristic has been used to extend the addressable memory space of selected motherboards, which enable a better balance of main memory resources among cluster applications. This way is much more cost-effective than than implementing a full-fledged shared memory system. In this context, the memory scheduler is in charge of finding a suitable distribution of local and remote memory that maximizes the performance and guarantees a minimum QoS among the applications. Note that since changing the memory distribution is a slow process involving several motherboards, the memory scheduler needs to make sure that the target distribution provides better performance than the current one. In this paper, a performance predictor is designed in order to find the best memory distribution for a given set of applications executing in a cluster motherboard. The predictor uses simple hardware counters to estimate the expected impact on performance of the different memory distributions. The hardware counters provide the predictor with the information about the time spent in processor, memory access and network. The performance model used by the predictor has been validated in a detailed microarchitectural simulator using real benchmarks. Results show that the prediction accuracy never deviates more than 5% compared to the real results, being less than 0.5% in most of the cases.