Suds: automatic parallelization for raw processors

  • Authors:
  • Matthew Ian Frank;Anant Agarwal;Saman Amarasinghe

  • Affiliations:
  • -;-;-

  • Venue:
  • Suds: automatic parallelization for raw processors
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A computer can never be too fast or too cheap. Computer systems pervade nearly every aspect of science, engineering, communications and commerce because they perform certain tasks at rates unachievable by any other kind of system built by humans. A computer system's throughput, however, is constrained by that system's ability to find concurrency. Given a particular target work load the computer architect's role is to design mechanisms to find and exploit the available concurrency in that work load. This thesis describes SUDS (Software Un-Do System), a compiler and runtime system that can automatically find and exploit the available concurrency of scalar operations in imperative programs with arbitrary unstructured and unpredictable control flow. The core compiler transformation that enables this is scalar queue conversion. Scalar queue conversion makes scalar renaming an explicit operation through a process similar to closure conversion, a technique traditionally used to compile functional languages. The scalar queue conversion compiler transformation is speculative, in the sense that it may introduce dynamic memory allocation operations into code that would not otherwise dynamically allocate memory. Thus, SUDS also includes a transactional runtime system that periodically checkpoints machine state, executes code speculatively, checks if the speculative execution produced results consistent with the original sequential program semantics, and then either commits or rolls back the speculative execution path. In addition to safely running scalar queue converted code, the SUDS runtime system safely permits threads to speculatively run in parallel and concurrently issue memory operations, even when the compiler is unable to prove that the reordered memory operations will always produce correct results. Using this combination of compile time and runtime techniques, SUDS can find concurrency in programs where previous compiler based renaming techniques fail because the programs contain unstructured loops, and where Tomasulo's algorithm fails because it sequentializes mispredicted branches. Indeed, we describe three application programs, with unstructured control flow, where the prototype SUDS system, running in software on a Raw microprocessor, achieves speedups equivalent to, or better than, an idealized, and unrealizable, model of a hardware implementation of Tomasulo's algorithm. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)