A throughput-driven task creation and mapping for network processors

Authors:
Lixia Liu;Xiao-Feng Li;Michael Chen;Roy D. C. Ju
Affiliations:
Intel China Research Center Ltd., Beijing, China;Intel China Research Center Ltd., Beijing, China;Intel Corporation, Microprocessor Technology Lab, Santa Clara;Intel Corporation, Microprocessor Technology Lab, Santa Clara
Venue:
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Year:
2007

Citing 12
Cited 1

Optimal mapping of sequences of data parallel tasks

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal latency-throughput tradeoffs for data parallel pipelines

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Building a robust software-based router using network processors

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Partitioning and Scheduling Parallel Programs for Multiprocessors

Partitioning and Scheduling Parallel Programs for Multiprocessors
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Optimal Processor Assignment for a Class of Pipelined Computations

IEEE Transactions on Parallel and Distributed Systems
Taming the IXP network processor

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Ixp2400-2800 Programming: The Complete Microengine Coding Guide

Ixp2400-2800 Programming: The Complete Microengine Coding Guide
Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatically partitioning packet processing applications for pipelined architectures

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic multithreading and multiprocessing of C programs for IXP

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Task partitioning for multi-core network processors

CC'05 Proceedings of the 14th international conference on Compiler Construction

Synthesis and optimization of pipelined packet processors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Network processors are programmable devices that can process packets at a high speed. A network processor is typified by multi-threading and heterogeneous multiprocessing, which usually requires programmers to manually create multiple tasks and map these tasks onto different processing elements. This paper addresses the problem of automating task creation and mapping of network applications onto the underlying hardware to maximize their throughput. We propose a throughput cost model to guide the task creation and mapping with the objective of both minimizing the number of stages in the processing pipeline and maximizing the average throughput of the slowest task simultaneously. The average throughput is modeled by taking communication cost, computation cost, memory access latency and synchronization cost into account. We envision that programmers write small functions for network applications, such that we use grouping and duplication to construct tasks from the functions. The optimal solution of creating tasks from m functions and mapping them to n processors is an NP-hard problem. Therefore, we present a practical and efficient heuristic algorithm with an O((n+m)m) complexity and show that the obtained solutions produce excellent performance for typical network applications. The entire framework has been implemented in the Open Research Compiler (ORC) adapted to compile network applications written in a domain-specific dataflow language. Experimental results show that the code produced by our compiler can achieve the 100% throughput on the OC-48 input line rate. OC-48 is a fiber optic connection that can handle a 2.488Gbps connection speeds, which is what our targeted hardware was designed for. We also demonstrate the importance of good creation and mapping choices on achieving high throughput. Furthermore, we show that reducing communication cost and efficient resource management are the most important factors for maximizing throughput on the Intel IXP network processors.