Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors

Authors:
Jinquan Dai;Long Li;Bo Huang
Affiliations:
Intel China Software Center;Intel China Software Center;Intel China Software Center
Venue:
Proceedings of the International Symposium on Code Generation and Optimization
Year:
2007

Citing 12
Cited 1

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A stream compiler for communication-exposed architectures

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Internetworking with TCP/IP, Vol 2: Design, Implementation, and Internals (4th Edition)

Internetworking with TCP/IP, Vol 2: Design, Implementation, and Internals (4th Edition)
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Automatically partitioning packet processing applications for pipelined architectures

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic multithreading and multiprocessing of C programs for IXP

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal

A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor

Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

To keep up with the explosive internet packet processing demands, modern network processors (NPs) employ a highly parallel, multi-threaded and multi-core architecture. In such a parallel paradigm, accesses to the shared variables in the external memory (and the associated memory latency) are contained in the critical sections, so that they can be executed atomically and sequentially by different threads in the network processor. In this paper, we present a novel program transformation that is used in the Intel Auto-partitioning C Compiler for IXP to exploit the inherent finer-grained parallelism of those critical sections, using the software-controlled caching mechanism available in the NPs. Consequently, those critical sections can be executed in a pipelined fashion by different threads, thereby effectively hiding the memory latency and improving the performance of network applications. Experimental results show that the proposed transformation provides impressive speedup (up-to 9.94 and scalability (up-to 80 threads) of the performance for the real-world network application (a l0Gbps Efhernet Core/Metro Router).