Toward predictable performance in software packet-processing platforms

Authors:
Mihai Dobrescu;Katerina Argyraki;Sylvia Ratnasamy
Affiliations:
EPFL, Switzerland;EPFL, Switzerland;UC Berkeley
Venue:
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Year:
2012

Citing 27
Cited 7

Footprints in the cache

ACM Transactions on Computer Systems (TOCS)
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
A protocol-independent technique for eliminating redundant network traffic

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
The click modular router

ACM Transactions on Computer Systems (TOCS)
Analytical cache models with applications to cache partitioning

ICS '01 Proceedings of the 15th international conference on Supercomputing
Flexible Control of Parallelism in a Multiprocessor PC Router

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
False sharing and its effect on shared memory performance

Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Can software routers scale?

Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
Using OS Observations to Improve Performance in Multicore Systems

IEEE Micro
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
vGreen: a system for energy efficient computing in virtualized environments

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
RouteBricks: exploiting parallelism to scale software routers

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Evaluation techniques for storage hierarchies

IBM Systems Journal
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Resource-conscious scheduling for energy efficiency on multicore processors

Proceedings of the 5th European conference on Computer systems
Leveraging parallelism for multi-dimensional packetclassification on software routers

Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
PacketShader: a GPU-accelerated software router

Proceedings of the ACM SIGCOMM 2010 conference
Controlling parallelism in a multicore software router

Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
SSLShader: cheap SSL acceleration with commodity processors

Proceedings of the 8th USENIX conference on Networked systems design and implementation
ServerSwitch: a programmable and high performance platform for data center networks

Proceedings of the 8th USENIX conference on Networked systems design and implementation
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
A case for NUMA-aware contention management on multicore systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Design and implementation of a consolidated middlebox architecture

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation

NaaS: network-as-a-service in the cloud

Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Design and implementation of a consolidated middlebox architecture

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
The power of batching in the Click modular router

Proceedings of the Asia-Pacific Workshop on Systems
The power of batching in the click modular router

APSys'12 Proceedings of the Third ACM SIGOPS Asia-Pacific conference on Systems
Bridging the gap between applications and networks in data centers

ACM SIGOPS Operating Systems Review
DeepDive: transparently identifying and managing performance interference in virtualized environments

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
High-Performance network traffic processing systems using commodity hardware

DataTraffic Monitoring and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

To become a credible alternative to specialized hardware, general-purpose networking needs to offer not only flexibility, but also predictable performance. Recent projects have demonstrated that general-purpose multicore hard-ware is capable of high-performance packet processing, but under a crucial simplifying assumption of uniformity: all processing cores see the same type/amount of traffic and run identical code, while all packets incur the same type of conventional processing (e.g., IP forwarding). Instead, we present a general-purpose packet-processing system that combines ease of programmability with predictable performance, while running a diverse set of applications and serving multiple clients with different needs. Offering predictability in this context is considered a hard problem because software processes contend for shared hardware resources--caches, memory controllers, buses--in unpredictable ways. Still, we show that, in our system, (a) the way in which resource contention affects performance is predictable and (b) the overall performance depends little on how different processes are scheduled on different cores. To the best of our knowledge, our results constitute the first evidence that, when designing software network equipment, flexibility and predictability are not mutually exclusive goals.