Achieving Efficiency and Portability in Systems Software: A Case Study on POSIX-Compliant Multithreaded Programs

Authors:
Yasushi Shinjo;Calton Pu
Affiliations:
IEEE;IEEE
Venue:
IEEE Transactions on Software Engineering
Year:
2005

Citing 25
Cited 0

Scheduler activations: effective kernel support for the user-level management of parallelism

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Using continuations to implement thread management and communication in operating systems

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Partial evaluation and automatic program generation

Partial evaluation and automatic program generation
Register relocation: flexible contexts for multithreading

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Exokernel: an operating system architecture for application-level resource management

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Extensibility safety and performance in the SPIN operating system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Optimistic incremental specialization: streamlining a commercial operating system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A lightweight process facility supporting meta-level programming

Parallel Computing
Eraser: a dynamic data race detector for multi-threaded programs

Proceedings of the sixteenth ACM symposium on Operating systems principles
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Multithreaded programming with Pthreads

Multithreaded programming with Pthreads
C and tcc: a language and compiler for dynamic code generation

ACM Transactions on Programming Languages and Systems (TOPLAS)
First-class user-level threads

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The benefits and costs of DyC's run-time optimizations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Specialization tools and techniques for systematic optimization of system software

ACM Transactions on Computer Systems (TOCS)
Towards bridging the gap between programming languages and partial evaluation

PEPM '02 Proceedings of the 2002 ACM SIGPLAN workshop on Partial evaluation and semantics-based program manipulation
Thread Time: A Multi-Threaded Programming Guide with Cdrom

Thread Time: A Multi-Threaded Programming Guide with Cdrom
Efficient Implementations of Software Architectures via Partial Evaluation

Automated Software Engineering
A Uniform Approach for Compile-Time and Run-Time Specialization

Selected Papers from the Internaltional Seminar on Partial Evaluation
An Environment for Building Customizable Software Components

CD '02 Proceedings of the IFIP/ACM Working Conference on Component Deployment
Automatic program specialization for Java

ACM Transactions on Programming Languages and Systems (TOPLAS)
Mapping software architectures to efficient implementations via partial evaluation

ASE '97 Proceedings of the 12th international conference on Automated software engineering (formerly: KBSE)
Fast, Optimized Sun RPC Using Automatic Program Specialization

ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Specialization classes: an object framework for specialization

IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
Specialization Scenarios: A Pragmatic Approach to Declaring Program Specialization

Higher-Order and Symbolic Computation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Portable (standards-compliant) systems software is usually associated with unavoidable overhead from the standards-prescribed interface. For example, consider the POSIX Threads standard facility for using thread-specific data (TSD) to implement multithreaded code. The first TSD reference must be preceded by pthread_getspecific(), typically implemented as a function or macro with 40-50 instructions. This paper proposes a method that uses the runtime specialization facility of the Tempo program specializer to convert such unavoidable source code into simple memory references of one or two instructions for execution. Consequently, the source code remains standard compliant and the executed code's performance is similar to direct global variable access. Measurements show significant performance gains over a range of code sizes. A random number generator (10 lines of C) shows a speedup of 4.8 times on a SPARC and 2.2 times on a Pentium. A time converter (2,800 lines) was sped up by 14 and 22 percent, respectively, and a parallel genetic algorithm system (14,000 lines) was sped up by 13 and 5 percent.