Automated task distribution in multicore network processors using statistical analysis

Authors:
Arindam Mallik;Yu Zhang;Gokhan Memik
Affiliations:
Northwestern University;Northwestern University;Northwestern University
Venue:
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Year:
2007

Citing 5
Cited 4

Efficient fair queueing using deficit round robin

SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
The click modular router

ACM Transactions on Computer Systems (TOCS)
Approximation Algorithms for Scheduling Problems

Approximation Algorithms for Scheduling Problems
Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

On runtime management in multi-core packet processing systems

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
MultiLayer processing - an execution model for parallel stateful packet processing

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
LATA: a latency and throughput-aware packet processing system

Proceedings of the 47th Design Automation Conference
SIP server performance on multicore systems

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip multiprocessor designs are the most common types of architectures seen in Network Processors. As the Network Processors are used to implement increasingly complicated applications, task distribution among the cores is becoming an important problem. In this paper, we propose a new task allocation scheme for such architectures. This scheme relies on the inherent modular nature of the networking applications and intelligently distributes modules among different execution cores. Additionally, we selectively replicate modules to parallelize execution of tasks having longer processing time. We have developed a technique that uses the probability distribution of the execution times of different modules in the networking applications. The proposed schemes result in resource utilization of up to 95%, 89%, and 84% on average for the processors with 2, 4, and 8 cores, respectively. The schemes are highly scalable and can improve the throughput by 6.72 times for 8 core processors, aggregated over four representative applications. The combination of selective replication of modules and variation-aware task allocation result in up to 12.5% (9.9% on average) performance improvement as compared to a scheme based on just mean processing time.