A new approach to the maximum flow problem
STOC '86 Proceedings of the eighteenth annual ACM symposium on Theory of computing
Task allocation onto a hypercube by recursive mincut bipartitioning
Journal of Parallel and Distributed Computing
Efficient network flow based min-cut balanced partitioning
ICCAD '94 Proceedings of the 1994 IEEE/ACM international conference on Computer-aided design
Advanced compiler design and implementation
Advanced compiler design and implementation
Introduction to Algorithms
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Throughput-driven synthesis of embedded software for pipelined execution on multicore architectures
ACM Transactions on Embedded Computing Systems (TECS)
Optimizing throughput and latency under given power budget for network packet processing
INFOCOM'10 Proceedings of the 29th conference on Information communications
LATA: a latency and throughput-aware packet processing system
Proceedings of the 47th Design Automation Conference
A scenario-based run-time task mapping algorithm for MPSoCs
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
Mapping packet processing applications onto embedded network processors (NP) is a challenging task due to the unique constraints of NP systems and the characteristics of network application domains. A remarkable difference with general multiprocessor task scheduling is that NPs are often programmed into a hybrid parallel and pipeline topology. In this paper, we introduce a multilevel balancing and refining algorithm for NP program mapping. We use a divide-and-conquer approach to recursively bipartition the task graph into disjoint subdomains. At each level of bipartition, the processing resources will be co-allocated so that an estimation of throughput can be derived. The bipartition continues until the code of the tasks can be fit into the instruction memory of processing elements. Then the algorithm iteratively refines the solution by migrating tasks from the bottleneck stage to other stages. The performance of our scheme is evaluated with a suite of NP benchmarks using SUIF/Machine SUIF compiler and Intel IXA Architecture Tool. The throughput improvement is significant: average throughput is increased by 20%, and the maximum is 108%.