Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Register windows vs. register allocation
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Experience with a retargetable compiler for a commercial network processor
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Resolving Register Bank Conflicts for a Network Processor
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
A Methodology and Tool Suite for C Compiler Generation from ADL Processor Models
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Balancing register allocation across threads for a multithreaded network processor
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Building ASIPs: The Mescal Methodology
Building ASIPs: The Mescal Methodology
C compiler design for a network processor
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
Sophisticated C compiler support for network processors (NPUs) is required to improve their usability and consequently, their acceptance in system design. Nonetheless, high-level code compilation always introduces overhead, regarding code size and performance compared to handwritten assembly code. This overhead results partially from high-level function calls that usually introduce memory accesses in order to save and reload register contents. A key feature of many NPU architectures is hardware multi-threading support, in the form of separate register files, for fast context switching between different application tasks. In this paper, a new NPU code optimization technique to use such HW contexts is presented that minimizes the overhead for saving and reloading register contents for function calls via the runtime stack. The feasibility and the performance gain of this technique are demonstrated for the Infineon Technologies PP32 NPU architecture and typical network application kernels.