An introduction to randomized algorithms
Discrete Applied Mathematics - Special volume: combinatorics and theoretical computer science
Randomized algorithms
Analysis of a shared-memory multiprocessor via a novel queuing model
Journal of Systems Architecture: the EUROMICRO Journal
ACM Transactions on Computer Systems (TOCS)
Scheduling DAG's for Asynchronous Multiprocessor Execution
IEEE Transactions on Parallel and Distributed Systems
Filtering Random Graphs to Synthesize Interconnection Networks with Multiple Objectives
IEEE Transactions on Parallel and Distributed Systems
AN ANALYSIS OF TIME-SHARED COMPUTER SYSTEMS
AN ANALYSIS OF TIME-SHARED COMPUTER SYSTEMS
IP-address lookup using LC-tries
IEEE Journal on Selected Areas in Communications
Design considerations for network processor operating systems
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Performance Models for Network Processor Design
IEEE Transactions on Parallel and Distributed Systems
An ILP formulation for system-level application mapping on network processor architectures
Proceedings of the conference on Design, automation and test in Europe
ILP and heuristic techniques for system-level design on network processor architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
MultiLayer processing - an execution model for parallel stateful packet processing
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Replication-based partial dynamic scheduling on heterogeneous network processors
APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
Parallel processing for block ciphers on a fault tolerant networked processor array
International Journal of High Performance Systems Architecture
Specification of network services and mapping algorithms
MILCOM'06 Proceedings of the 2006 IEEE conference on Military communications
Hi-index | 0.00 |
Network processors are embedded system-on-a-chip multiprocessors that are optimized to perform simple packet processing tasks at data rates of several Gigabits per second. To meet the performance demands of increasing link speeds and more complex network applications, network processors are implemented with several dozens of processor cores and execute multiple packet processing applications in parallel. The complexity of such systems makes it increasingly difficult for application developers to map applications to the various system resources and achieve optimal performance. We propose an automated profiling and mapping methodology for these highly parallel, embedded systems that starts out with a simple uniprocessor implementation of the networking application. An architecture independent representation of the runtime behavior of the application is used to map and schedule different processing steps to the underlying hardware. An analytic performance model is used in the process to estimate system performance and to find an near-optimal solution through iteration.