CEFT: A cost-effective, fault-tolerant parallel virtual file system
Journal of Parallel and Distributed Computing
SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
A case study of parallel I/O for biological sequence search on Linux clusters
International Journal of High Performance Computing and Networking
Fault tolerant file models for MPI-IO parallel file systems
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
Due to the ever-widening performance gap betweenprocessors and disks, I/O operations tend to become themajor performance bottleneck of data-intensiveapplications on modern clusters. If all the existing diskson the nodes of a cluster are connected together toestablish high performance parallel storage systems, thecluster's overall performance can be boosted at noadditional cost. CEFT-PVFS (a RAID 10 style parallelfile system that extends the original PVFS), as one suchsystem, divides the cluster nodes into two groups, stripesthe data across one group in a round-robin fashion, andthen duplicates the same data to the other group toprovide storage service of high performance and highreliability. Previous research has shown that the systemreliability is improved by a factor of more than 40 withmirroring while maintaining a comparable writeperformance. This paper presents another benefit ofCEFT-PVFS in which the aggregate peak readperformance can be improved by as much as 100% overthat of the original PVFS by exploiting the increasedparallelism.Additionally, when the data servers, which typicallyare also computational nodes in a cluster environment,are loaded in an unbalanced way by applicationsrunning in the cluster, the read performance of PVFSwill be degraded significantly. On the contrary, in theCEFT-PVFS, a heavily loaded data server can beskipped and all the desired data is read from itsmirroring node. Thus the performance will not beaffected unless both the server node and its mirroringnode are heavily loaded.