Optimizing server placement for parallel I/O in switch-based clusters

  • Authors:
  • Jan-Jan Wu;Yi-Fang Lin;Da-Wei Wang;Chien-Min Wang

  • Affiliations:
  • Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, ROC;Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, ROC and Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, ROC;Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, ROC;Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, ROC

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we consider how to optimize I/O server placement in order to improve parallel I/O performance in switch-based clusters. The significant advances in cluster networks in recent years have made it practical to connect tens of thousands of hosts via networks that have enormous and scalable total capacity, and in which communications between a host and any other host incur the same cost. The same cost property frees users from consideration of network contention and allows them to concentrate on load-balancing issues. We formulate the server placement problem on a cluster that has the same cost property as a weighted bipartite matching with the goal of balancing the workload on the I/O nodes. To find an optimal solution to this problem, we propose an O(n^3^2m(logn+logm)) algorithm, called Load Balance Matching (LBM), where n is the number of compute nodes and m is the number of I/O servers. We also investigate server placement for general clusters in which multiple same-cost subclusters are interconnected to form a large cluster. This class of clusters typically adopt irregular topologies that allow the construction of scalable systems with an incremental expansion capability. Also, due to the limited bandwidth on network links between subclusters, network link contention is a major concern when distributing servers over the entire network. We show that finding an optimal placement strategy for general clusters with the goal of minimizing link contention is computationally intractable. To resolve this problem, we propose a hierarchical strategy that places servers in two steps. First, to minimize link contention, we decide which subcluster each server should be assigned to. We propose a tree-based heuristic algorithm, called Load Balance Traversing (LBT), to solve this problem. In the second step, the LBM algorithm decides the location of each server within a subcluster. Our simulation results demonstrate that LBT achieves a significant improvement in parallel I/O performance over four other algorithms, and is near-optimal in some cases.