Mixing Hadoop and HPC workloads on parallel filesystems

Authors:
Esteban Molina-Estolano;Maya Gokhale;Carlos Maltzahn;John May;John Bent;Scott Brandt
Affiliations:
UC Santa Cruz;Lawrence Livermore National Laboratory;UC Santa Cruz;Lawrence Livermore National Laboratory;Los Alamos National Laboratory;UC Santa Cruz
Venue:
Proceedings of the 4th Annual Workshop on Petascale Data Storage
Year:
2009

Citing 4
Cited 0

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce-tailored distributed filesystems---such as HDFS for Hadoop MapReduce---and parallel high-performance computing filesystems are tailored for considerably different workloads. The purpose of our work is to examine the performance of each filesystem when both sorts of workload run on it concurrently. We examine two workloads on two filesystems. For the HPC workload, we use the IOR checkpointing benchmark and the Parallel Virtual File System, Version 2 (PVFS); for Hadoop, we use an HTTP attack classifier and the CloudStore filesystem. We analyze the performance of each file system when it concurrently runs its "native" workload as well as the non-native workload.