Optimizing packet accesses for a domain specific language on network processors

Authors:
Tao Liu;Xiao-Feng Li;Lixia Liu;Chengyong Wu;Roy Ju
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Intel China Research Center Ltd., Beijing, China;Intel China Research Center Ltd., Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Microprocessor Technology Labs, Intel Corporation, Santa Clara, CA
Venue:
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Year:
2005

Citing 9
Cited 1

Memory access coalescing: a technique for eliminating redundant memory accesses

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
The click modular router

ACM Transactions on Computer Systems (TOCS)
Smarter Memory: Improving Bandwidth for Streamed References

Computer
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
The nesC language: A holistic approach to networked embedded systems

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A pipelined memory architecture for high throughput network processors

Proceedings of the 30th annual international symposium on Computer architecture
Efficient use of memory bandwidth to improve network processor throughput

Proceedings of the 30th annual international symposium on Computer architecture
Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Technologies and building blocks for fast packet forwarding

IEEE Communications Magazine

Optimizing software cache performance of packet processing applications

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Programming network processors remains a challenging task since their birth until recently when high-level programming environments for them are emerging. By employing domain specific languages for packet processing, the new environments try to hide hardware details from the programmers and enhance both the programmability of the systems and the portability of the applications. A frequent issue for the new environments to be widely adopted is their relatively low achievable performance compared to low-level, hand-tuned programming. In this paper we present two techniques, Packet Access Combining (PAC) and Compiler-Generated Packet Caching (CGPC), to optimize packet accesses, which are shown as the performance bottleneck in such new environments for packet processing applications. PAC merges multiple packet accesses into a single wider access; CGPC implements an automatic packet data caching mechanism without a hardware cache. Both techniques focus on reducing long memory latency and expensive memory traffic, and they also reduce instruction counts significantly. We have implemented the proposed techniques in a high level programming environment for network processor named Shangri-La. Our evaluation with standard NPF benchmarks shows that for the evaluated applications the two techniques can reduce the memory traffic by 90% and improve the packet throughput by 5.8 times, on average.