Optimized Pipelined Parallel Merge Sort on the Cell BE

  • Authors:
  • Jörg Keller;Christoph W. Kessler

  • Affiliations:
  • Dept. of Math. and Computer Science, FernUniversität in Hagen, Hagen, Germany 58084;Dept. of Computer and Inf. Science, Linköpings Universitet, Linköping, Sweden 58183

  • Venue:
  • Euro-Par 2008 Workshops - Parallel Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Chip multiprocessors designed for streaming applications such as Cell BE offer impressive peak performance but suffer from limited bandwidth to off-chip main memory. As the number of cores is expected to rise further, this bottleneck will become more critical in the coming years. Hence, memory-efficient algorithms are required. As a case study, we investigate parallel sorting on Cell BE as a problem of great importance and as a challenge where the ratio between computation and memory transfer is very low. Our previous work led to a parallel mergesort that reduces memory bandwidth requirements by pipelining between SPEs, but the allocation of SPEs was rather ad-hoc. In our present work, we investigate mappings of merger nodes to SPEs. The mappings are designed to provide optimal trade-offs between load balancing, buffer memory consumption, and communication load on the on-chip bus. We solve this multi-objective optimization problem by deriving an integer linear programming formulation and compute Pareto-optimal solutions for the mapping of merge trees with up to 127 merger nodes. For mapping larger trees, we give a fast divide-and-conquer based approximation algorithm. We evaluate the sorting algorithm resulting from our mappings by a discrete event simulation.