Optimizing fastquery performance on lustre file system

Authors:
Kuan-Wu Lin;Surendra Byna;Jerry Chou;Kesheng Wu
Affiliations:
National Tsing Hua Univeristy, Hsinchu, Taiwan;Lawrence Berkeley National Laboratory, Berkeley, CA;National Tsing Hua Univeristy, Hsinchu, Taiwan;Lawrence Berkeley National Laboratory, Berkeley, CA
Venue:
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Year:
2013

Citing 17
Cited 0

Automatic parallel I/O performance optimization in Panda

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Performance modeling for the panda array I/O library

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Database--Principles, Programming and Performance

Database--Principles, Programming and Performance
Model 204 Architecture and Performance

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Automatic Parallel I/O Performance Optimization Using Genetic Algorithms

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Parallel netCDF: A High-Performance Scientific I/O Interface

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The impact of spatial layout of jobs on I/O hotspots in mesh networks

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Optimizing bitmap indices with efficient compression

ACM Transactions on Database Systems (TODS)
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)

CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scientific Data Management: Challenges, Technology, and Deployment

Scientific Data Management: Challenges, Technology, and Deployment
FastQuery: a general indexing and querying system for scientific data

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Parallel index and query for large scale data analysis

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
FastQuery: A Parallel Indexing System for Scientific Data

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Parallel I/O, analysis, and visualization of a trillion particle simulation

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for auto-tuning HDF5 applications

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

FastQuery is a parallel indexing and querying system we developed for accelerating analysis and visualization of scientific data. We have applied it to a wide variety of HPC applications and demonstrated its capability and scalability using a petascale trillion-particle simulation in our previous work. Yet, through our experience, we found that performance of reading and writing data with FastQuery, like many other HPC applications, could be significantly affected by various tunable parameters throughout the parallel I/O stack. In this paper, we describe our success in tuning the performance of FastQuery on a Lustre parallel file system. We study and analyze the impact of parameters and tunable settings at file system, MPI-IO library, and HDF5 library levels of the I/O stack. We demonstrate that a combined optimization strategy is able to improve performance and I/O bandwidth of FastQuery significantly. In our tests with a trillion-particle dataset, the time to index the dataset reduced by more than one half.