Analysis and simulation of a fair queueing algorithm
SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
PLAN: a packet language for active networks
ICFP '98 Proceedings of the third ACM SIGPLAN international conference on Functional programming
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment
Journal of the ACM (JACM)
C Compiler Design for an Industrial Network Processor
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Network processing in content inspection applications
Proceedings of the 14th international symposium on Systems synthesis
Building a robust software-based router using network processors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
WRAPS Scheduling and Its Efficient Implementation on Network Processors
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Effective Compilation Support for Variable Instruction Set Architecture
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Taming the IXP network processor
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Resolving Register Bank Conflicts for a Network Processor
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Balancing register allocation across threads for a multithreaded network processor
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
CommBench-a telecommunications benchmark for network processors
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Analysis of a window-constrained scheduler for real-time and best-effort packet streams
RTSS'10 Proceedings of the 21st IEEE conference on Real-time systems symposium
Hi-index | 0.00 |
Mapping packet processing tasks on network processor micro-engines involves complex tradeoffs that relating to maximizing parallelism and pipelining. Due to an increase in the size of the code store and complexity of the application requirements, network processors are being programmed with heterogeneous threads that may execute code belonging to different tasks on a given micro-engine. Also, most network applications are streaming applications that are typically processed in a pipelined fashion. Thus, the tasks on different micro-engines are pipelined in such a way as to maximize the throughput. Tasks themselves could have different run time performance demands. Traditionally, runtime management involving processor sharing, real-time scheduling etc. is provided by the runtime environment (typically an operating system) using the hardware support for timers and interrupts that allows time slicing the resource amongst the tasks. However, due to stringent performance requirements on network processors (which process packets from very high speed network traffic), neither OS nor hardware mechanisms are typically feasible/available.In this paper, we show that it is very difficult and inefficient for the programmer to meet the constraints of runtime management by coding them statically. Due to the infeasibility of hardware or OS solution (even in the near future), the only choice left is a compiler approach.We propose a complete compiler solution to automatically insert explicit context switch (ctx) instructions provided on the processors so that the execution of programs is better manipulated at runtime to meet their constraints. We show that such an approach is feasible opening new application domains that would need heterogeneous thread programming. Such approaches would in general become important for multi-core processors.