Optimizing packet accesses for a domain specific language on network processors

  • Authors:
  • Tao Liu;Xiao-Feng Li;Lixia Liu;Chengyong Wu;Roy Ju

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Intel China Research Center Ltd., Beijing, China;Intel China Research Center Ltd., Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Microprocessor Technology Labs, Intel Corporation, Santa Clara, CA

  • Venue:
  • LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Programming network processors remains a challenging task since their birth until recently when high-level programming environments for them are emerging. By employing domain specific languages for packet processing, the new environments try to hide hardware details from the programmers and enhance both the programmability of the systems and the portability of the applications. A frequent issue for the new environments to be widely adopted is their relatively low achievable performance compared to low-level, hand-tuned programming. In this paper we present two techniques, Packet Access Combining (PAC) and Compiler-Generated Packet Caching (CGPC), to optimize packet accesses, which are shown as the performance bottleneck in such new environments for packet processing applications. PAC merges multiple packet accesses into a single wider access; CGPC implements an automatic packet data caching mechanism without a hardware cache. Both techniques focus on reducing long memory latency and expensive memory traffic, and they also reduce instruction counts significantly. We have implemented the proposed techniques in a high level programming environment for network processor named Shangri-La. Our evaluation with standard NPF benchmarks shows that for the evaluated applications the two techniques can reduce the memory traffic by 90% and improve the packet throughput by 5.8 times, on average.