A cost-intelligent application-specific data layout scheme for parallel file systems

  • Authors:
  • Huaiming Song;Yanlong Yin;Yong Chen;Xian-He Sun

  • Affiliations:
  • Illinois Institute of Technology, Chicago, IL, USA;Illinois Institute of Technology, Chicago, IL, USA;Texas Tech University, Lubbock, TX, USA;Illinois Institute of Technology, Chicago, IL, USA

  • Venue:
  • Proceedings of the 20th international symposium on High performance distributed computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

I/O data access is a recognized performance bottleneck of high-end computing. Several commercial and research parallel file systems have been developed in recent years to ease the performance bottleneck. These advanced file systems perform well on some applications but may not perform well on others. They have not reached their full potential in mitigating the I/O-wall problem. Data access is application dependent. Based on the application-specific optimization principle, in this study we propose a cost-intelligent data access strategy to improve the performance of parallel file systems. We first present a novel model to estimate data access cost of different data layout policies. Next, we extend the cost model to calculate the overall I/O cost of any given application and choose an appropriate layout policy for the application. A complex application may consist of different data access patterns. Averaging the data access patterns may not be the best solution for those complex applications that do not have a dominant pattern. We then further propose a hybrid data replication strategy for those applications, so that a file can have replications with different layout policies for the best performance. Theoretical analysis and experimental testing have been conducted to verify the newly proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach achieved up to 74% performance improvement for data-intensive applications.