Megaproto/E: power-aware high-performance cluster with commodity technology

  • Authors:
  • Taisuke Boku;Mitsuhisa Sato;Daisuke Takahashi;Hiroshi Nakashima;Hiroshi Nakamura;Satoshi Matsuoka;Yoshihiko Hotta

  • Affiliations:
  • University of Tsukuba;University of Tsukuba;University of Tsukuba;Toyohashi University of Technology;University of Tokyo;Tokyo Institute of Technology;University of Tsukuba

  • Venue:
  • IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.02

Visualization

Abstract

In our research project named "Mega-Scale Computing Based on Low-Power Technology and Workload Modeling", we have been developing a prototype cluster not based on ASIC or FPGA but instead only using commodity technology. Its packaging is extremely compact and dense, and its performance/power ratio is very high. Our previous prototype system named "MegaProto" demonstrated that one cluster unit, which consists of 16 commodity low-power processors, can be successfully implemented on just 1U height chassis and it is capable of up to 2.8 times higher performance/ power ratio than ordinary high-performance dual-Xeon 1U server units. We have improved MegaProto by replacing the CPU and enhancing the I/O performance. The new cluster unit named "MegaProto/E" with 16 Transmeta Efficeon processors achieves 32 GFlops of peak performance, which is 2.2- fold greater than that of the original one. The cluster unit is equipped with an independent dual network of Gigabit Ethernet, including dual 24-port switches. The maximum power consumption of the cluster unit is 320 W, which is comparable with that of today's high-end PC servers for high performance clusters. Performance evaluation using NPB kernels and HPL shows that the performance of MegaProto/E exceeds that of a dual-Xeon server in all the benchmarks, and its performance ratio ranges from 1.3 to 3.7. These results reveal that our solution of implementing a number of ultra low-power processors in compact packaging is an excellent way to achieve extremely high performance in applications with a certain degree of parallelism. We are now building a multi-unit cluster with 128 CPUs (8 units) to prove that this advantage still holds with higher scalability