Placement of I/O servers to improve parallel I/O performance on switch-based clusters

  • Authors:
  • Jan-Jan Wu;Da-Wei Wang;Yih-Fang Lin

  • Affiliations:
  • Academia Sinica, Taiwan, R.O.C.;Academia Sinica, Taiwan, R.O.C.;Academia Sinica, Taiwan, R.O.C.

  • Venue:
  • ICS '03 Proceedings of the 17th annual international conference on Supercomputing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Switch-based clusters -- Network of Workstations/PCs connected by commodity switches, have been an appealing vehicle for high-performance computing. Despite their attractive features, cluster systems still have some limits when compared with traditional massively parallel machines. First, cluster systems usually have limited number of processing nodes, making full utilization of the computing power provided by each processing node a critical issue. Secondly, cluster systems are usually constructed with slower interconnects, making the network speed, not the disk speed, the limiting factor for parallel I/O performance.The notion of part-time I/O is commonly used for I/O in clusters, where a subset of processing nodes become I/O nodes at I/O time and return to computation after finishing the I/O operation. Careful assignment of part-time I/O nodes is the key to overcoming the above two limiting factors. Prior work reported an optimal assignment strategy for cluster systems with shared-media interconnects, based on an optimization that minimizes total amount of remote data transfers in parallel I/O. In this paper, we show that load balance on the I/O nodes, not the total amount of remote data transfers, is the key optimization criteria for assigning part-time I/O nodes for switch-based clusters. We formulate the assignment problem as a weighed bipartite matching with the goal to balance workload on the I/O nodes. We then propose an O(n3over2m(logn + logm)) algorithm to find optimal solution for this problem, where n is the number of compute nodes and m the number of I/O nodes. Experimental results on a 16-node PC cluster and simulation results for larger clusters are reported.