A throughput-driven task creation and mapping for network processors

  • Authors:
  • Lixia Liu;Xiao-Feng Li;Michael Chen;Roy D. C. Ju

  • Affiliations:
  • Intel China Research Center Ltd., Beijing, China;Intel China Research Center Ltd., Beijing, China;Intel Corporation, Microprocessor Technology Lab, Santa Clara;Intel Corporation, Microprocessor Technology Lab, Santa Clara

  • Venue:
  • HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Network processors are programmable devices that can process packets at a high speed. A network processor is typified by multi-threading and heterogeneous multiprocessing, which usually requires programmers to manually create multiple tasks and map these tasks onto different processing elements. This paper addresses the problem of automating task creation and mapping of network applications onto the underlying hardware to maximize their throughput. We propose a throughput cost model to guide the task creation and mapping with the objective of both minimizing the number of stages in the processing pipeline and maximizing the average throughput of the slowest task simultaneously. The average throughput is modeled by taking communication cost, computation cost, memory access latency and synchronization cost into account. We envision that programmers write small functions for network applications, such that we use grouping and duplication to construct tasks from the functions. The optimal solution of creating tasks from m functions and mapping them to n processors is an NP-hard problem. Therefore, we present a practical and efficient heuristic algorithm with an O((n+m)m) complexity and show that the obtained solutions produce excellent performance for typical network applications. The entire framework has been implemented in the Open Research Compiler (ORC) adapted to compile network applications written in a domain-specific dataflow language. Experimental results show that the code produced by our compiler can achieve the 100% throughput on the OC-48 input line rate. OC-48 is a fiber optic connection that can handle a 2.488Gbps connection speeds, which is what our targeted hardware was designed for. We also demonstrate the importance of good creation and mapping choices on achieving high throughput. Furthermore, we show that reducing communication cost and efficient resource management are the most important factors for maximizing throughput on the Intel IXP network processors.