Communicating sequential processes
Communicating sequential processes
Compiler algorithms for synchronization
IEEE Transactions on Computers
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Analysis of event synchronization in a parallel programming tool
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Optimal code motion: theory and practice
ACM Transactions on Programming Languages and Systems (TOPLAS)
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Parallelism for free: efficient and optimal bitvector analyses for parallel programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Points-to analysis in almost linear time
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Eliminating partially dead code in explicitly parallel programs
Theoretical Computer Science - Special issue on parallel computing
Redundant Synchronization Elimination for DOACROSS Loops
IEEE Transactions on Parallel and Distributed Systems
Removing unnecessary synchronization in Java
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Global optimization by suppression of partial redundancies
Communications of the ACM
Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences
IEEE Transactions on Parallel and Distributed Systems
Optimally Synchronizing DOACROSS Loops on Shared Memory Multiprocessors
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Latency hiding through multithreading on a network processor
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors
Proceedings of the International Symposium on Code Generation and Optimization
Optimizing software cache performance of packet processing applications
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Automatic partitioning and mapping of stream-based applications onto the Intel IXP Network processor
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
A throughput-driven task creation and mapping for network processors
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Hi-index | 0.01 |
Effective compilation of packet processing applications onto the Intel IXP network processors requires, among other things, the automatic use of multiple threads on one or more processing elements, and the automatic introduction of synchronization as required to correctly enforce dependences between such threads. We describe the program transformation that is used in the Intel Auto-partitioning C Compiler for IXP to automatically multithread/multi-process a program for the IXP. This transformation consists of steps that introduce inter-thread signaling to enforce dependences, optimize the placement of such signaling, reduce the number of signals in use to the number available in hardware, and transform the initialization code for correct execution in the multithreaded version. Experimental results show that our method provides impressive speedup for six PPSes (Packet Processing Stages) in the widely used NPF IP forwarding benchmarks. For most packet processing stages, our algorithms can achieve almost linear performance improvement after automatic multi-threading transformation. The automatic multi-processing transformation help further boost the speedup of two PPSes.