Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
BPF+: exploiting global data-flow optimization in a generalized packet filter architecture
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Modern Compiler Implementation in C: Basic Techniques
Modern Compiler Implementation in C: Basic Techniques
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Packet classification using multidimensional cutting
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Code Optimization for Packet Filters
SAINT-W '07 Proceedings of the 2007 International Symposium on Applications and the Internet Workshops
Software pipelining for packet filters
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.00 |
Packet filters are essential for network traffic/security management on the Internet. Filters implemented by software on general-purpose CPUs are very flexible but occasionally suffer from poor performance. In order to address this problem, we have investigated software pipelining techniques for loops with a number of conditional branches for use in software-based fast packet filters. Based on our previous researches, we herein apply the software pipelining approach in an attempt to increase the filter performance for large filter rules. We validate the effectiveness of the proposed approach on Intel x86-32/64 series, as well as Intel Itanium 2 processors, which speaks to the generality and practicality of the proposed approach. The software pipelined program codes on x86-64 processors are 2.2 times faster than C-compiler-based codes and 1.8 times faster than carefully optimized hand-compiled codes. In addition, the performance of the pipelined codes we obtained on x86-64 processors is comparable to that on Itanium 2 processors with predicate registers.