A New File-Specific Stripe Size Selection Method for Highly Concurrent Data Access

Authors:
Bin Dong;Xiuqiao Li;Limin Xiao;Li Ruan
Affiliations:
-;-;-;-
Venue:
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Year:
2012

Citing 23
Cited 0

Asynchronous Disk Interleaving: Approximating Access Delays

IEEE Transactions on Computers
An analytic performance model of disk arrays

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Striping in a RAID level 5 disk array

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
File-Access Characteristics of Parallel Scientific Workloads

IEEE Transactions on Parallel and Distributed Systems
Maximizing performance in a striped disk array

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Learning to Classify Parallel Input/Output Access Patterns

IEEE Transactions on Parallel and Distributed Systems
Data partitioning and load balancing in parallel disk systems

The VLDB Journal — The International Journal on Very Large Data Bases
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Issues and Challenges in the Performance Analysis of Real Disk Arrays

IEEE Transactions on Parallel and Distributed Systems
The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Exploring the performance impact of stripe size on network attached storage systems

Journal of Systems Architecture: the EUROMICRO Journal
Design tradeoffs for SSD performance

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Using server-to-server communication in parallel file systems to simplify consistency and improve performance

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A self-tuning disk striping system for parallel input/output

A self-tuning disk striping system for parallel input/output
I/O performance challenges at leadership scale

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A study of client-based caching for parallel i/o

A study of client-based caching for parallel i/o
Scalable Earthquake Simulation on Petascale Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hystor: making the best use of solid state drives in high performance storage systems

Proceedings of the international conference on Supercomputing
Server-side I/O coordination for parallel file systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hint controlled distribution with parallel file systems

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

The data-intensive scientific applications running on high-end computing system depend on parallel file systems for high-speed data input/output. In most parallel file systems, a file is partitioned into multiple subfiles with a view to allowing it to be accessed concurrently. An important factor in the file partition is the stripe size. However, while working well for certain applications, most existing schemes for determining the stripe size for a file still lack the ability to handle highly concurrent data accesses, which is typical for most parallel scientific applications. To address this problem, this paper presents an analytic model to assess the performance of highly concurrent data accesses at first, and then it describes how to apply this model to select the stripe size of a file. Experimental results demonstrate that the accuracy of the analytic model is around $87.89\%$ and the stripe size selected with it can improve the aggregated I/O bandwidth of \mbox{FLASH I/O} up to $5.8$ times compared with well-known methods. This paper also discusses how to incorporate our method into real-world parallel file systems.