Analytic modeling of network processors for parallel workload mapping

Authors:
Ning Weng;Tilman Wolf
Affiliations:
Southern Illinois University Carbondale, Carbondale, IL;University of Massachusetts Amherst, Amherst, MA
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2009

Citing 19
Cited 2

An introduction to randomized algorithms

Discrete Applied Mathematics - Special volume: combinatorics and theoretical computer science
Performance prediction of parallel processing systems: the PAMELA methodology

ICS '93 Proceedings of the 7th international conference on Supercomputing
Randomized algorithms

Randomized algorithms
The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Scheduling issues in high-performance computing

ACM SIGMETRICS Performance Evaluation Review
Analysis of a shared-memory multiprocessor via a novel queuing model

Journal of Systems Architecture: the EUROMICRO Journal
Static scheduling algorithms for allocating directed task graphs to multiprocessors

ACM Computing Surveys (CSUR)
The click modular router

ACM Transactions on Computer Systems (TOCS)
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Performance Tradeoffs in Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Scheduling DAG's for Asynchronous Multiprocessor Execution

IEEE Transactions on Parallel and Distributed Systems
Filtering Random Graphs to Synthesize Interconnection Networks with Multiple Objectives

IEEE Transactions on Parallel and Distributed Systems
Programmable Stream Processors

Computer
Design considerations for network processor operating systems

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Analysis of Memory Interference in Multiprocessors

IEEE Transactions on Computers
A General Model for Memory Interference in Multiprocessors

IEEE Transactions on Computers
Analysis of Network Processing Workloads

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
IP-address lookup using LC-tries

IEEE Journal on Selected Areas in Communications

Analysis of network processing workloads

Journal of Systems Architecture: the EUROMICRO Journal
Runtime resource allocation in multi-core packet processing systems

HPSR'09 Proceedings of the 15th international conference on High Performance Switching and Routing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Network processors are heterogeneous system-on-chip multiprocessors that are optimized to perform packet forwarding and processing tasks at Gigabit data rates. To meet the performance demands of increasing link speeds and complex network applications, network processors are implemented with several dozen embedded processor cores and hardware accelerators that run multiple packet processing applications in parallel. The parallel nature of the processing system makes it increasingly difficult for application developers to understand and manage resources and map processing tasks to the hardware. To address this problem, we present a methodology for profiling and analyzing network processor applications, mapping processing tasks to a generalized network processor architecture, and analytically determining the expected throughput performance. The key novelty of this work is not only the adaptation of application analysis and mapping algorithms to heterogeneous network processors, but also that the entire process can be automated and hidden from the application developer. Starting with the analysis of a uniprocessor implementation of the application, the process yields a mapping of the partitioned application that shows best performance for a given network processor system. The simplicity of the proposed randomized mapping algorithm allows the use of this methodology in network processor runtime systems where dynamic reallocation of tasks is necessary but processing power is limited. We present results that show the effectiveness of the analysis and mapping methodology as well as its application to design space exploration.