Scheduler activations: effective kernel support for the user-level management of parallelism
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Using continuations to implement thread management and communication in operating systems
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Partial evaluation and automatic program generation
Partial evaluation and automatic program generation
Register relocation: flexible contexts for multithreading
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Exokernel: an operating system architecture for application-level resource management
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Extensibility safety and performance in the SPIN operating system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Optimistic incremental specialization: streamlining a commercial operating system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A lightweight process facility supporting meta-level programming
Parallel Computing
Eraser: a dynamic data race detector for multi-threaded programs
Proceedings of the sixteenth ACM symposium on Operating systems principles
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Multithreaded programming with Pthreads
Multithreaded programming with Pthreads
C and tcc: a language and compiler for dynamic code generation
ACM Transactions on Programming Languages and Systems (TOPLAS)
First-class user-level threads
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The benefits and costs of DyC's run-time optimizations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Specialization tools and techniques for systematic optimization of system software
ACM Transactions on Computer Systems (TOCS)
Towards bridging the gap between programming languages and partial evaluation
PEPM '02 Proceedings of the 2002 ACM SIGPLAN workshop on Partial evaluation and semantics-based program manipulation
Thread Time: A Multi-Threaded Programming Guide with Cdrom
Thread Time: A Multi-Threaded Programming Guide with Cdrom
Efficient Implementations of Software Architectures via Partial Evaluation
Automated Software Engineering
A Uniform Approach for Compile-Time and Run-Time Specialization
Selected Papers from the Internaltional Seminar on Partial Evaluation
An Environment for Building Customizable Software Components
CD '02 Proceedings of the IFIP/ACM Working Conference on Component Deployment
Automatic program specialization for Java
ACM Transactions on Programming Languages and Systems (TOPLAS)
Mapping software architectures to efficient implementations via partial evaluation
ASE '97 Proceedings of the 12th international conference on Automated software engineering (formerly: KBSE)
Fast, Optimized Sun RPC Using Automatic Program Specialization
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Specialization classes: an object framework for specialization
IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
Specialization Scenarios: A Pragmatic Approach to Declaring Program Specialization
Higher-Order and Symbolic Computation
Hi-index | 0.01 |
Portable (standards-compliant) systems software is usually associated with unavoidable overhead from the standards-prescribed interface. For example, consider the POSIX Threads standard facility for using thread-specific data (TSD) to implement multithreaded code. The first TSD reference must be preceded by pthread_getspecific(), typically implemented as a function or macro with 40-50 instructions. This paper proposes a method that uses the runtime specialization facility of the Tempo program specializer to convert such unavoidable source code into simple memory references of one or two instructions for execution. Consequently, the source code remains standard compliant and the executed code's performance is similar to direct global variable access. Measurements show significant performance gains over a range of code sizes. A random number generator (10 lines of C) shows a speedup of 4.8 times on a SPARC and 2.2 times on a Pentium. A time converter (2,800 lines) was sped up by 14 and 22 percent, respectively, and a parallel genetic algorithm system (14,000 lines) was sped up by 13 and 5 percent.