The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
SIGNAL: A declarative language for synchronous programming of real-time systems
Proc. of a conference on Functional programming languages and computer architecture
Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The ESTEREL synchronous programming language: design, semantics, implementation
Science of Computer Programming
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Selective specialization for object-oriented languages
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Modulo scheduling with multiple initiation intervals
Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
GURPR—a method for global software pipelining
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Unroll-and-jam using uniformly generated sets
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling for the TMS320C6x VLIW DSP architecture
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Loop fusion for clustered VLIW architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Imagine: Media Processing with Streams
IEEE Micro
Perfect Pipelining: A New Loop Parallelization Technique
ESOP '88 Proceedings of the 2nd European Symposium on Programming
FIAT: A Framework for Interprocedural Analysis and Transfomation
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Optimizing Loop Performance for Clustered VLIW Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Interprocedural Analysis for Parallelization
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Improving Software Pipelining With Unroll-and-Jam
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Techniques for Software Thread Integration in Real-Time Embedded Systems
RTSS '98 Proceedings of the IEEE Real-Time Systems Symposium
System-Level Issues for Software Thread Integration: Guest Triggering and Host Selection
RTSS '99 Proceedings of the 20th IEEE Real-Time Systems Symposium
Compiling for Fine-Grain Concurrency: Planning and Performing Software Thread Integration
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Procedure Cloning and Integration for Converting Parallelism from Coarse to Fine Grain
INTERACT '03 Proceedings of the Seventh Workshop on Interaction between Compilers and Computer Architectures
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Source level merging of independent programs
Journal of Parallel and Distributed Computing
Software thread integration for instruction-level parallelism
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Software pipelining is a critical optimization for producing efficient code for VLIW/EPIC and superscalar processors in high-performance embedded applications such as digital signal processing. Software thread integration (STI) can often improve the performance of looping code in cases where software pipelining performs poorly or fails. This paper examines both situations, presenting methods to determine what and when to integrate.We evaluate our methods on C-language image and digital signal processing libraries and synthetic loop kernels. We compile them for a very long instruction word (VLIW) digital signal processor (DSP) -- the Texas Instruments (TI) C64x architecture. Loops which benefit little from software pipelining (SWP-Poor) speed up by 26% (harmonic mean, HM). Loops for which software pipelining fails (SWP-Fail) due to conditionals and calls speed up by 16% (HM). Combining SWP-Good and SWP-Poor loops leads to a speedup of 55% (HM).