Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors
International Journal of Parallel Programming
Hi-index | 0.01 |
In the quest for higher performance and with the increasing availability of multicore chips, many systems are currently packing more processors per node. Adopting a ccNUMA node architecture in these cases has the promise of achieving a balance between cost and performance. In this paper, a 2312 Opteron cores system based on Sun Fire servers is considered as a case study to examine the performance issues associated with such architectures. In this work, we characterize the performance behavior of the system with focus on the node level using different configurations. It will be shown that the benefits from larger nodes can be severely limited due to many reasons. These reasons were isolated and the associated performance losses were assessed. The results revealed that such problems were mainly caused by topological imbalances, limitations of the used cache coherency protocol, operating system services distribution, and the lack of intelligent management of memory affinity.