A specimen of parallel programming: parallel merge sort implementation

  • Authors:
  • Timothy J. Rolfe

  • Affiliations:
  • Eastern Washington University, Cheney, Washington

  • Venue:
  • ACM Inroads
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

One common example of parallel processing is the implementation of the merge sort within a parallel processing environment. In the fully parallel model, you repeatedly split the sublists down to the point where you have single-element lists [3]. You then merge these in parallel back up the processing tree until you obtain the fully merged list at the top of the tree. While of theoretical interest, you probably don't have the massively parallel processor that this would require. Instead, you can use a mixed strategy. Determine the number of parallel processes you can realistically obtain within your computing environment. Then construct the processing tree so that you have that number of leaf nodes. Within the leaf nodes of the processing tree, simply use the best sequential algorithm to accomplish the sorting, and send that result upstream to the internal nodes of the processing tree, which will merge the sorted sublists and then send the resulting list farther upstream in the tree. Figure One shows the processing tree for the case in which you have a list of 2000 items to be sorted and have resources only sufficient for four parallel processes. The processes receiving the size 500 lists use some sequential sorting algorithm. Because of the implementation environment, it will be something in the C/C++ language---either qsort() or your favorite implementation of a fast sorting algorithm. Each leaf node (with a size 500 list) then provides the sorted result to the parent process within the processing tree. That process combines the two lists to generate a size 1000 list, and then sends that result upstream to its parent process. Finally, the root process in the processing tree merges the two lists to obtain a size 2000 list, fully sorted. If your environment supports more parallel processes, you might take the processing tree to four levels, so that eight processes do the sequential sorting of size 250 lists. For that matter, you could even deal with circumstances in which the supported number of parallel processes is not an exact power of two. That just means that some of the leaf nodes will be at the bottommost level and some conceptually at a higher level above in the processing tree. Since in parallel processing, the time required is the time required by the slowest process, you will probably want to stick with circumstances where the number of leaf nodes is a power of two---in other words, the processing tree is a full binary tree and all leaf nodes are doing approximately the same amount of work.