CoScan: cooperative scan sharing in the cloud

  • Authors:
  • Xiaodan Wang;Christopher Olston;Anish Das Sarma;Randal Burns

  • Affiliations:
  • Johns Hopkins University, Baltimore, MD;Yahoo! Research Sunnyvale, CA;Google Research Mountain View, CA;Johns Hopkins University Baltimore, MD

  • Venue:
  • Proceedings of the 2nd ACM Symposium on Cloud Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present CoScan, a scheduling framework that eliminates redundant processing in workflows that scan large batches of data in a map-reduce computing environment. CoScan merges Pig programs from multiple users at runtime to reduce I/O contention while adhering to soft deadline requirements in scheduling. This includes support for join workflows that operate on multiple data sources. Our solution maps well to workflows at many Internet companies which reuse data from a common set of inputs. Experiments on the PigMix data analytics benchmark exhibit orders of magnitude reduction in resource contention with minimal impact on latency.