Optimizing transport protocol parameters for large scale PC cluster and its evaluation with parallel data mining

  • Authors:
  • Masato Oguchi;Masaru Kitsuregawa

  • Affiliations:
  • Institute of Industrial Science, The University of Tokyo, 7-22-1 Roppongi, Minato-ku Tokyo 106-8558, Japan;Institute of Industrial Science, The University of Tokyo, 7-22-1 Roppongi, Minato-ku Tokyo 106-8558, Japan

  • Venue:
  • Cluster Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, PC clusters have come to be studied intensively for large scale parallel computers of the next generation. ATM technology is a strong candidate as a de facto standard of high speed communication networks. Therefore, an ATM-connected PC cluster is a promising platform from the cost/performance point of view, as a future high performance computing environment. Data intensive applications, such as data mining and ad hoc query processing in databases, are considered very important for massively parallel processors, as well as for conventional scientific calculations. Thus, investigating the feasibility of applications on an ATM-connected PC cluster is meaningful. In this paper, an ATM-connected PC cluster consisting of 100 PCs is reported, and characteristics of a transport layer protocol for the PC cluster are evaluated. Point-to-point communication performance is measured and discussed, when a TCP window size parameter is changed. Parallel data mining is implemented and evaluated on the cluster. Retransmission caused by cell loss at the ATM switch is analyzed, and parameters of retransmission mechanism suitable for parallel processing on the large scale PC cluster are clarified. Default TCP protocol cannot provide good performance, since a lot of collisions happen during all-to-all multicasting executed on the large scale PC cluster. Using TCP parameters with the proposed optimization, performance improvement is achieved for parallel data mining on 100 PCs.