The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Locking effects in multiprocessor implementations of protocols
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
On parallelizing and optimizing the implementation of communication protocols
IEEE/ACM Transactions on Networking (TON)
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
A comparison of list schedules for parallel processing systems
Communications of the ACM
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
On the Granularity and Clustering of Directed Acyclic Task Graphs
IEEE Transactions on Parallel and Distributed Systems
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing)
Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing)
Program mapping onto network processors by recursive bipartitioning and refining
Proceedings of the 44th annual Design Automation Conference
Automated task distribution in multicore network processors using statistical analysis
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
A scalable multithreaded L7-filter design for multi-core servers
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Hi-index | 0.00 |
Current packet processing systems only aim at producing high throughput without considering packet latency reduction. For many real-time embedded network applications, it is essential that the processing time not exceed a given threshold. In this paper, we propose LATA, a LAtency and Throughput-Aware packet processing system for multicore architectures. Based on parallel pipeline core topology, LATA can satisfy the latency constraint and produce high throughput by exploiting fine-grained task-level parallelism. We implement LATA on an Intel machine with two Quad-Core Xeon E5335 processors and compare it with four other systems (Parallel, Greedy, Random and Bipar) for six network applications. LATA exhibits an average of 36.5% reduction of latency and a maximum of 62.2% reduction of latency for URL over Random with comparable throughput performance.