Performance Evaluation of Clusters with ccNUMA Nodes - A Case Study

Authors:
Abdullah Kayi;Edward Kornkven;Tarek El-Ghazawi;Samy Al-Bahra;Gregory B. Newby
Affiliations:
-;-;-;-;-
Venue:
HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Year:
2008

Citing 0
Cited 1

Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

International Journal of Parallel Programming

Quantified Score

Hi-index	0.01

Visualization

Abstract

In the quest for higher performance and with the increasing availability of multicore chips, many systems are currently packing more processors per node. Adopting a ccNUMA node architecture in these cases has the promise of achieving a balance between cost and performance. In this paper, a 2312 Opteron cores system based on Sun Fire servers is considered as a case study to examine the performance issues associated with such architectures. In this work, we characterize the performance behavior of the system with focus on the node level using different configurations. It will be shown that the benefits from larger nodes can be severely limited due to many reasons. These reasons were isolated and the associated performance losses were assessed. The results revealed that such problems were mainly caused by topological imbalances, limitations of the used cache coherency protocol, operating system services distribution, and the lack of intelligent management of memory affinity.