Profile-guided automatic inline expansion for C programs
Software—Practice & Experience
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
ASIAN '97 Proceedings of the Third Asian Computing Science Conference on Advances in Computing Science
Techniques for Software Thread Integration in Real-Time Embedded Systems
RTSS '98 Proceedings of the IEEE Real-Time Systems Symposium
Compiling for Fine-Grain Concurrency: Planning and Performing Software Thread Integration
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Procedure Cloning and Integration for Converting Parallelism from Coarse to Fine Grain
INTERACT '03 Proceedings of the Seventh Workshop on Interaction between Compilers and Computer Architectures
Hardware to Software Migration with Real-Time Thread Integration
EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1
Programmer specified pointer independence
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Complementing software pipelining with software thread integration
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Hi-index | 0.00 |
In here we describe a technique to merge at source level two (and hence more) independent C programs. Due to the independence of the programs, the merged program has more parallelism that can be extracted by the underlying compiler and CPU. Thus it is expected that the execution time of the merged program will be better than the time obtained by executing the two programs separately. The usefulness of such merging for embedded systems has been studied and demonstrated by the works of Dean and others with the Thrint compiler for merging threads at Assembly level. The main contribution of this work is an efficient algorithm for matching sub-components considering the inside structure of the sub-components and not only their execution frequency. Two novel techniques for balancing the merge of sub-components are presented: *Residual loop merging (RLM) as a way to merge loops with different nesting and execution frequency levels. *Using the remaining iterations formed after merging two non-equal loops (loops with different number of iterations) in future mergings of other loops. These two abilities allow the proposed algorithm to simplify the matching process and overcome merging problems related to deep nested structure. We also consider the problem of merging function calls and make extensive use of cloning (and not only inlining as is the case with previous works). The final tool is the first complete system for merging C-programs at source level supporting profile and structure based matching. The main use of merging is to speed up embedded systems that usually execute independent threads or processes that can potentially be merged. Our experimental results suggest that the proposed merging technique can speedup the execution of two independent programs by 10%-20% for about half of mergings that have been tested.