The case for hardware transactional memory in software packet processing

Authors:
Martin Labrecque;J. Gregory Steffan
Affiliations:
University of Toronto;University of Toronto
Venue:
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Year:
2010

Citing 28
Cited 3

Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Locking effects in multiprocessor implementations of protocols

SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
The Click modular router

Proceedings of the seventeenth ACM symposium on Operating systems principles
Handling of packet dependencies: a critical issue for highly parallel network processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
NpBench: A Benchmark Suite for Control plane and Data plane Applications for Network Processors

ICCD '03 Proceedings of the 21st International Conference on Computer Design
Snort - Lightweight Intrusion Detection for Networks

LISA '99 Proceedings of the 13th USENIX conference on System administration
Automatically partitioning packet processing applications for pipelined architectures

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Prototyping Architectural Support for Program Rollback Using FPGAs

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Overcoming the memory wall in packet processing: hammers or ladders?

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Evaluating Network Processors using NetBench

ACM Transactions on Embedded Computing Systems (TECS)
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Expressing and exploiting concurrency in networked applications with aspen

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Designing extensible IP router software

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
NetFPGA--An Open Platform for Gigabit-Rate Network Switching and Routing

MSE '07 Proceedings of the 2007 IEEE International Conference on Microelectronic Systems Education
Internet clean-slate design: what and why?

ACM SIGCOMM Computer Communication Review
PIN: a binary instrumentation tool for computer architecture research and education

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Custom code generation for soft processors

ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Configurable Transactional Memory

FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
LogTM-SE: Decoupling Hardware Transactional Memory from Caches

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Experimental study of router buffer sizing

Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
MultiLayer processing - an execution model for parallel stateful packet processing

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Scaling Soft Processor Systems

FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
Practice of parallelizing network applications on multi-core architectures

Proceedings of the 23rd international conference on Supercomputing
Towards 100 gbit/s ethernet: multicore-based parallel communication protocol design

Proceedings of the 23rd international conference on Supercomputing
On the Performance of Contention Managers for Complex Transactional Memory Benchmarks

ISPDC '09 Proceedings of the 2009 Eighth International Symposium on Parallel and Distributed Computing
Is transactional programming actually easier?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Language and compiler support for stream programs

Language and compiler support for stream programs
Application-specific signatures for transactional memory in soft processors

ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications

NetTM: faster and easier synchronization for soft multicores via transactional memory

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Application-specific signatures for transactional memory in soft processors

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A regular expression matching engine with hybrid memories

Computer Standards & Interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software packet processing is becoming more important to enable differentiated and rapidly-evolving network services. With increasing numbers of programmable processor and accelerator cores per network node, it is a challenge to support sharing and synchronization across them in a way that is scalable and easy-to-program. In this paper, we focus on parallel/threaded applications that have irregular control-flow and frequently-updated shared state that must be synchronized across threads. However, conventional lock-based synchronization is both difficult to use and also often results in frequent conservative serialization of critical sections. Alternatively, we propose that Transactional memory (TM) is a good match to software packet processing: it both (i) can allow the system to optimistically exploit parallelism between the processing of packets whenever it is safe to do so, and (ii) is easy-to-use for a programmer. With the NetFPGA [1] platform and four network packet processing applications that are threaded and share memory, we evaluate hardware support for TM (HTM) using the reconfigurable FPGA fabric. Relative to NetThreads [2], our two-processor four-way-multithreaded system with conventional lock-based synchronization, we find that adding HTM achieves 6%, 54% and 57% increases in packet throughput for three of four packet processing applications studied, due to reduced conservative serialization.