Mixing Hadoop and HPC workloads on parallel filesystems

  • Authors:
  • Esteban Molina-Estolano;Maya Gokhale;Carlos Maltzahn;John May;John Bent;Scott Brandt

  • Affiliations:
  • UC Santa Cruz;Lawrence Livermore National Laboratory;UC Santa Cruz;Lawrence Livermore National Laboratory;Los Alamos National Laboratory;UC Santa Cruz

  • Venue:
  • Proceedings of the 4th Annual Workshop on Petascale Data Storage
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce-tailored distributed filesystems---such as HDFS for Hadoop MapReduce---and parallel high-performance computing filesystems are tailored for considerably different workloads. The purpose of our work is to examine the performance of each filesystem when both sorts of workload run on it concurrently. We examine two workloads on two filesystems. For the HPC workload, we use the IOR checkpointing benchmark and the Parallel Virtual File System, Version 2 (PVFS); for Hadoop, we use an HTTP attack classifier and the CloudStore filesystem. We analyze the performance of each file system when it concurrently runs its "native" workload as well as the non-native workload.