Hybrid merge/overlap execution technique for parallel array processing

  • Authors:
  • Emad Soroush;Magdalena Balazinska

  • Affiliations:
  • University of Washington, Seattle;University of Washington, Seattle

  • Venue:
  • Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Whether in business or science, multi-dimensional arrays are a common abstraction in data analytics and many systems exist for efficiently processing arrays. As dataset grow in size, it is becoming increasingly important to process these arrays in parallel. In this paper, we discuss different types of array operations and review how they can be processed in parallel using two different existing techniques. The first technique, which we call merge, consists in partitioning an array, processing the partitions in parallel, then merging the results to reconcile computations that span partition boundaries. The second technique, which we call overlap, consists in partitioning an array into subarrays that overlap by a given number of cells along each dimension. Thanks to this overlap, the array partitions can be processed in parallel without any merge phase. We discuss when each technique can be applied to an array operation. We show that even for a single array operation, a different approach may yield the best performance for different regions of an array. Following this observation, we introduce a new parallel array processing technique that combines the merge and overlap approaches. Our technique enables a parallel array processing system to mix-and-match the merge and overlap techniques within a single operation on an array. Through experiments on real, scientific data, we show that this hybrid approach outperforms the other two techniques.