Whole-program optimization for time and space efficient threads

  • Authors:
  • Dirk Grunwald;Richard Neves

  • Affiliations:
  • Dept. of Computer Science, University of Colorado, Campus Box 430, Boulder, CO;P.O. Box 218, IBM. T.J. Watson Research, Yorktown Heights, NY

  • Venue:
  • Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern languages and operating systems often encourage programmers to use threads, or independent control streams, to mask the overhead of some operations and simplify program structure. Multitasking operating systems use threads to mask communication latency, either with hardwares devices or users. Client-server applications typically use threads to simplify the complex control-flow that arises when multiple clients are used. Recently, the scientific computing community has started using threads to mask network communication latency in massively parallel architectures, allowing computation and communication to be overlapped. Lastly, some architectures implement threads in hardware, using those threads to tolerate memory latency.In general, it would be desirable if threaded programs could be written to expose the largest degree of parallelism possible, or to simplify the program design. However, threads incur time and space overheads, and programmers often compromise simple designs for performance. In this paper, we show how to reduce time and space thread overhead using control flow and register liveness information inferred after compilation. Our techniques work on binaries, are not specific to a particular compiler or thread library and reduce the the overall execution time of fine-grain threaded programs by ≈ 15-30%. We use execution-driven analysis and an instrumented operating system to show why the execution time is reduced and to indicate areas for future work.