A provably time-efficient parallel implementation of full speculation

  • Authors:
  • John Greiner;Guy E. Blelloch

  • Affiliations:
  • Rice Univ., Houston, TX;Carnegie Mellon Univ., Pittsburgh, PA

  • Venue:
  • ACM Transactions on Programming Languages and Systems (TOPLAS)
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speculative evaluation, including leniency and futures, is often used to produce high degrees of parallelism. Understanding the performance characteristics of such evaluation, however, requires having a detailed understanding of the implementation. For example, the particular implementaion technique used to suspend and reactivate threads can have an asymptotic effect on performance. With the goal of giving the users some understanding of performance without requiring them to understand the implementation, we present a provable implementation bound for a language based on speculative evaluation. The idea is (1) to supply the users with a semantics for a language that defines abstract costs for measuring or analyzing the performance of computations, (2) to supply the users with a mapping of these costs onto runtimes on various machine models, and (3) to describe an implementation strategy of the language and prove that it meets these mappings. For this purpose we consider a simple language based on speculative evaluation. For every computation, the semantics of the language returns a directed acyclic graph (DAG) in which each node represents a unit of computation, and each edge represents a dependence. We then describe an implementation strategy of the language and show that any computation with w work (the number of nodes in the DAG) and d depth (the length of the longest path in the DAG) will run on a p-processor PRAM in O(w/p + d log p) time. The bounds are work efficient (within a constant factor of linear speedup) when there is sufficient parallelism, w/d ≥ p log p. These are the first time bounds we know of for languages with speculative evaluation. The main challenge is in parallelizing the necessary queuing operations on suspended threads.