Cronus: A platform for parallel code generation based on computational geometry methods

  • Authors:
  • Theodore Andronikos;Florina M. Ciorba;Panayiotis Theodoropoulos;Dimitrios Kamenopoulos;George Papakonstantinou

  • Affiliations:
  • Department of Informatics, Ionian University, 7, Tsirigoti Square, 49100 Corfu, Greece;Computing Systems Laboratory, Department of Electrical and Computer Engineering, National Technical University of Athens, Zografou Campus, 15773 Athens, Greece;Computing Systems Laboratory, Department of Electrical and Computer Engineering, National Technical University of Athens, Zografou Campus, 15773 Athens, Greece;Computing Systems Laboratory, Department of Electrical and Computer Engineering, National Technical University of Athens, Zografou Campus, 15773 Athens, Greece;Computing Systems Laboratory, Department of Electrical and Computer Engineering, National Technical University of Athens, Zografou Campus, 15773 Athens, Greece

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes Cronus, a platform for parallelizing general nested loops. General nested loops contain complex loop bodies (assignments, conditionals, repetitions) and exhibit uniform loop-carried dependencies. The novelty of Cronus is twofold: (1) it determines the optimal scheduling hyperplane using the QuickHull algorithm, which is more efficient than previously used methods, and (2) it implements a simple and efficient dynamic rule (successive dynamic scheduling) for the runtime scheduling of the loop iterations along the optimal hyperplane. This scheduling policy enhances data locality and improves the makespan. Cronus provides an efficient runtime library, specifically designed for communication minimization, that performs better than more generic systems, such as Berkeley UPC. Its performance was evaluated through extensive testing. Three representative case studies are examined: the Floyd-Steinberg dithering algorithm, the Transitive Closure algorithm, and the FSBM motion estimation algorithm. The experimental results corroborate the efficiency of the parallel code. The tests show speedup ranging from 1.18 (out of the ideal 4) to 12.29 (out of the ideal 16) on distributed-systems and 3.60 (out of 4) to 15.79 (out of 16) on shared-memory systems. Cronus outperforms UPC by 5-95% depending on the test case.