Profiling and mapping of parallel workloads on network processors

Authors:
Ning Weng;Tilman Wolf
Affiliations:
University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA
Venue:
Proceedings of the 2005 ACM symposium on Applied computing
Year:
2005

Citing 8
Cited 8

An introduction to randomized algorithms

Discrete Applied Mathematics - Special volume: combinatorics and theoretical computer science
Randomized algorithms

Randomized algorithms
Analysis of a shared-memory multiprocessor via a novel queuing model

Journal of Systems Architecture: the EUROMICRO Journal
The click modular router

ACM Transactions on Computer Systems (TOCS)
Scheduling DAG's for Asynchronous Multiprocessor Execution

IEEE Transactions on Parallel and Distributed Systems
Filtering Random Graphs to Synthesize Interconnection Networks with Multiple Objectives

IEEE Transactions on Parallel and Distributed Systems
AN ANALYSIS OF TIME-SHARED COMPUTER SYSTEMS

AN ANALYSIS OF TIME-SHARED COMPUTER SYSTEMS
IP-address lookup using LC-tries

IEEE Journal on Selected Areas in Communications

Design considerations for network processor operating systems

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Performance Models for Network Processor Design

IEEE Transactions on Parallel and Distributed Systems
An ILP formulation for system-level application mapping on network processor architectures

Proceedings of the conference on Design, automation and test in Europe
ILP and heuristic techniques for system-level design on network processor architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
MultiLayer processing - an execution model for parallel stateful packet processing

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Replication-based partial dynamic scheduling on heterogeneous network processors

APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
Parallel processing for block ciphers on a fault tolerant networked processor array

International Journal of High Performance Systems Architecture
Specification of network services and mapping algorithms

MILCOM'06 Proceedings of the 2006 IEEE conference on Military communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Network processors are embedded system-on-a-chip multiprocessors that are optimized to perform simple packet processing tasks at data rates of several Gigabits per second. To meet the performance demands of increasing link speeds and more complex network applications, network processors are implemented with several dozens of processor cores and execute multiple packet processing applications in parallel. The complexity of such systems makes it increasingly difficult for application developers to map applications to the various system resources and achieve optimal performance. We propose an automated profiling and mapping methodology for these highly parallel, embedded systems that starts out with a simple uniprocessor implementation of the networking application. An architecture independent representation of the runtime behavior of the application is used to map and schedule different processing steps to the underlying hardware. An analytic performance model is used in the process to estimate system performance and to find an near-optimal solution through iteration.