TCP/IP illustrated (vol. 1): the protocols
TCP/IP illustrated (vol. 1): the protocols
Memory access coalescing: a technique for eliminating redundant memory accesses
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
A new algorithm for partial redundancy elimination based on SSA form
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Bidwidth analysis with application to silicon compilation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
ACM Transactions on Computer Systems (TOCS)
Performance modeling for fast IP lookups
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Experience with a retargetable compiler for a commercial network processor
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Programming language optimizations for modular router configurations
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Representation for Bit Section Based Analysis and Optimization
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Flexible Control of Parallelism in a Multiprocessor PC Router
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Taming the IXP network processor
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Ixp2400-2800 Programming: The Complete Microengine Coding Guide
Ixp2400-2800 Programming: The Complete Microengine Coding Guide
Memory Hierarchy Design for a Multiprocessor Look-up Engine
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Simple offset assignment in presence of subword data
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Programming challenges in network processor deployment
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Balancing register allocation across threads for a multithreaded network processor
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Framework for supporting multi-service edge packet processing on network processors
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Effective thread management on network processors with compiler analysis
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Expressing and exploiting concurrency in networked applications with aspen
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
FEADS: a framework for exploring the application design space on network processors
International Journal of Parallel Programming
Interactive presentation: Hard- and software modularity of the NOVA MPSoC platform
Proceedings of the conference on Design, automation and test in Europe
ILP and heuristic techniques for system-level design on network processor architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Program mapping onto network processors by recursive bipartitioning and refining
Proceedings of the 44th annual Design Automation Conference
Automated task distribution in multicore network processors using statistical analysis
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
EMSOFT '08 Proceedings of the 8th ACM international conference on Embedded software
Optimus: efficient realization of streaming applications on FPGAs
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Design of a scalable network programming framework
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A remotely accessible network processor-based router for network experimentation
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
An intrusion detection sensor for the NetVM virtual processor
ICOIN'09 Proceedings of the 23rd international conference on Information Networking
Runtime resource allocation in multi-core packet processing systems
HPSR'09 Proceedings of the 15th international conference on High Performance Switching and Routing
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A throughput-driven task creation and mapping for network processors
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
A time division multiplexing (TDM) logic mapping method for computational applications
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I
Mobile Information Systems - Mobile and Wireless Networks
LATA: a latency and throughput-aware packet processing system
Proceedings of the 47th Design Automation Conference
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Compilation of stream programs for multicore processors that incorporate scratchpad memories
Proceedings of the Conference on Design, Automation and Test in Europe
Compiler assisted dynamic management of registers for network processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Transformation-based parallelization of request-processing applications
MODELS'10 Proceedings of the 13th international conference on Model driven engineering languages and systems: Part II
Orchestration by approximation: mapping stream programs onto multicore architectures
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Frenetic: a network programming language
Proceedings of the 16th ACM SIGPLAN international conference on Functional programming
Compiler-Supported Thread Management for Multithreaded Network Processors
ACM Transactions on Embedded Computing Systems (TECS)
400 Gb/s Programmable Packet Parsing on a Single FPGA
Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
Optimizing packet accesses for a domain specific language on network processors
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A compiler and run-time system for network programming languages
POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Supporting reconfigurable parallel multimedia applications
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A register allocation framework for banked register files with access constraints
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Profile-guided deployment of stream programs on multicores
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
StreamPI: a stream-parallel programming extension for object-oriented programming languages
The Journal of Supercomputing
Dynamic scheduling of stream programs on embedded multi-core processors
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Editorial: Recent developments in high performance computing and security: An editorial
Future Generation Computer Systems
Sigma*: symbolic learning of input-output specifications
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Hi-index | 0.00 |
Programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers have recently proposed high-level programming languages for packet processing, but the challenges of compiling these languages into code that is competitive with hand-tuned assembly remain unanswered.This paper describes the Shangri-La compiler, which accepts a packet program written in a C-like high-level language and applies scalar and specialized optimizations to generate a highly optimized binary. Hot code paths identified by profiling are mapped across processing elements to maximize processor utilization. Since our compilation target has no hardware caches, software-controlled caches are generated for frequently accessed application data structures. Packet handling optimizations significantly reduce per-packet memory access and instruction counts. Finally, a custom stack model maps stack frames to the fastest levels of the target processor's heterogeneous memory hierarchy.Binaries generated by the compiler were evaluated on the Intel IXP2400 network processor with eight packet processing cores and eight threads per core. Our results show the importance of both traditional and specialized optimization techniques for achieving the maximum forwarding rates on three network applications, L3-Switch, MPLS and Firewall.