Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Global register allocation at link time
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
The Mahler experience: using an intermediate language as the machine description
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A portable global optimizer and linker
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A simple interprocedural register allocation algorithm and its effectiveness for LISP
ACM Transactions on Programming Languages and Systems (TOPLAS)
Register allocation across procedure and module boundaries
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Representing control in the presence of first-class continuations
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The interaction of architecture and operating system design
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Space-efficient closure representations
LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Performance of a hardware-assisted real-time garbage collector
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A hybrid execution model for fine-grained languages on distributed memory multicomputers
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
ICS '90 Proceedings of the 4th international conference on Supercomputing
An overview of the mesa processor architecture
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
Exploiting dead value information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Pthreads for dynamic and irregular parallelism
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Balancing register allocation across threads for a multithreaded network processor
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
MTSS: multi task stack sharing for embedded systems
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Offline compression for on-chip ram
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MTSS: Multitask stack sharing for embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Eliminating the call stack to save RAM
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compiler support for lightweight context switching
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Modern languages and operating systems often encourage programmers to use threads, or independent control streams, to mask the overhead of some operations and simplify program structure. Multitasking operating systems use threads to mask communication latency, either with hardwares devices or users. Client-server applications typically use threads to simplify the complex control-flow that arises when multiple clients are used. Recently, the scientific computing community has started using threads to mask network communication latency in massively parallel architectures, allowing computation and communication to be overlapped. Lastly, some architectures implement threads in hardware, using those threads to tolerate memory latency.In general, it would be desirable if threaded programs could be written to expose the largest degree of parallelism possible, or to simplify the program design. However, threads incur time and space overheads, and programmers often compromise simple designs for performance. In this paper, we show how to reduce time and space thread overhead using control flow and register liveness information inferred after compilation. Our techniques work on binaries, are not specific to a particular compiler or thread library and reduce the the overall execution time of fine-grain threaded programs by ≈ 15-30%. We use execution-driven analysis and an instrumented operating system to show why the execution time is reduced and to indicate areas for future work.