Grid Datafarm Architecture for Petascale Data Intensive Computing

  • Authors:
  • Osamu Tatebe;Youhei Morita;Satoshi Matsuoka;Noriyuki Soda;Satoshi Sekiguchi

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Grid Datafarm (Gfarm) architecture is designed forglobal petascale data-intensive computing.It provides aglobal parallel filesystem with online petascale storage,scalable I/O bandwidth, and scalable parallel processing,and it can exploit local I/O in a grid of clusters with tensof thousands of nodes.Gfarm parallel I/O APIs and commands provide a single filesystem image and manipulatefilesystem metadata consistently.Fault tolerance and loadbalancing are automatically managed by file duplication orrecomputation using a command history log.Preliminaryperformance evaluation has shown scalable disk I/O andnetwork bandwidth on 64 nodes of the Presto III Athloncluster.The Gfarm parallel I/O write and read operationshas achieved data transfer rates of 1.74 GB/s and 1.97GB/s, respectively, using 64 cluster nodes.The Gfarm parallelfile copy reached 443 MB/s with 23 parallel streams on the Myrinet 2000.The Gfarm architecture is expected to enable petascale data-intensive Grid computing with an I/O bandwidth scales to the TB/s range and scalable computationalpower.