The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Introduction to the Design and Analysis of Algorithms
Introduction to the Design and Analysis of Algorithms
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A Remote Memory Swapping System for Cluster Computers
PDCAT '07 Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies
Proceedings of the 22nd annual international conference on Supercomputing
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
A Resource Optimized Remote-Memory-Access Architecture for Low-latency Communication
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
A Scheduling Heuristic to Handle Local and Remote Memory in Cluster Computers
HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
A cluster computer performance predictor for memory scheduling
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Hi-index | 0.00 |
Cluster computers represent a cost-effective alternative solution to supercomputers. In these systems, it is common to constrain the memory address space of a given processor to the local motherboard. Constraining the system in this way is much cheaper than using a full-fledged shared memory implementation among motherboards. However, memory usage among motherboards can be unfairly balanced.On the other hand, remote memory access (RMA) hardware provides fast interconnects among the motherboards of a cluster. RMA devices can be used to access remote RAM memory from a local motherboard. This work focuses on this capability in order to achieve a better global use of the total RAM memory in the system. More precisely, the address space of local applications is extended to remote motherboards and is used to access remote RAM memory.This paper presents an ideal memory scheduling algorithm and proposes a cost-effective heuristic to allocate local and remote memory among local applications. Compared to the devised ideal algorithm, the heuristic obtains the same (or closely resembling) results while largely reducing the computational cost. In addition, we analyze the impact on the performance of stand alone applications varying the memory distribution among regions (local, local to board, and remote). Then, this study is extended to any number of concurrent applications. Experimental results show that a QoS parameter is needed in order to avoid unacceptable performance degradation.