Program mapping onto network processors by recursive bipartitioning and refining

Authors:
Jia Yu;Jingnan Yao;Laxmi Bhuyan;Jun Yang
Affiliations:
University of California Riverside, Riverside, CA;University of California Riverside, Riverside, CA;University of California Riverside, Riverside, CA;University of Pittsburgh, Pittsburgh, PA
Venue:
Proceedings of the 44th annual Design Automation Conference
Year:
2007

Citing 9
Cited 4

A new approach to the maximum flow problem

STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
Task allocation onto a hypercube by recursive mincut bipartitioning

Journal of Parallel and Distributed Computing
Efficient network flow based min-cut balanced partitioning

ICCAD '94 Proceedings of the 1994 IEEE/ACM international conference on Computer-aided design
Advanced compiler design and implementation

Advanced compiler design and implementation
Introduction to Algorithms

Introduction to Algorithms
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatically partitioning packet processing applications for pipelined architectures

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures

ACM Transactions on Embedded Computing Systems (TECS)
Optimizing throughput and latency under given power budget for network packet processing

INFOCOM'10 Proceedings of the 29th conference on Information communications
LATA: a latency and throughput-aware packet processing system

Proceedings of the 47th Design Automation Conference
A scenario-based run-time task mapping algorithm for MPSoCs

Proceedings of the 50th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mapping packet processing applications onto embedded network processors (NP) is a challenging task due to the unique constraints of NP systems and the characteristics of network application domains. A remarkable difference with general multiprocessor task scheduling is that NPs are often programmed into a hybrid parallel and pipeline topology. In this paper, we introduce a multilevel balancing and refining algorithm for NP program mapping. We use a divide-and-conquer approach to recursively bipartition the task graph into disjoint subdomains. At each level of bipartition, the processing resources will be co-allocated so that an estimation of throughput can be derived. The bipartition continues until the code of the tasks can be fit into the instruction memory of processing elements. Then the algorithm iteratively refines the solution by migrating tasks from the bottleneck stage to other stages. The performance of our scheme is evaluated with a suite of NP benchmarks using SUIF/Machine SUIF compiler and Intel IXA Architecture Tool. The throughput improvement is significant: average throughput is increased by 20%, and the maximum is 108%.