Threads and input/output in the synthesis kernal
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Detecting violations of sequential consistency
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Lock-free data structures
Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors
Journal of Parallel and Distributed Computing
Specifying Concurrent Program Modules
ACM Transactions on Programming Languages and Systems (TOPLAS)
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A stateless, content-directed data prefetching mechanism
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap
IEEE Transactions on Computers
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Automatic Extraction of Functional Parallelism from Ordinary Programs
IEEE Transactions on Parallel and Distributed Systems
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Multicores from the Compiler's Perspective: A Blessing or a Curse?
Proceedings of the international symposium on Code generation and optimization
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Using elimination to implement scalable and lock-free FIFO queues
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Overcoming the memory wall in packet processing: hammers or ladders?
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting locality to ameliorate packet queue contention and serialization
Proceedings of the 3rd conference on Computing frontiers
From Sequential Programs to Concurrent Threads
IEEE Computer Architecture Letters
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Frame shared memory: line-rate networking on commodity hardware
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Visualizing potential parallelism in sequential programs
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Serialization sets: a dynamic dependence-based parallel execution model
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Practice of parallelizing network applications on multi-core architectures
Proceedings of the 23rd international conference on Supercomputing
A concurrent dynamic analysis framework for multicore hardware
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Speculative parallelization using software multi-threaded transactions
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Decoupled software pipelining creates parallelization opportunities
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Distributed stream processing with DUP
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
A lock-free, cache-efficient shared ring buffer for multi-core architectures
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Scalable Graph Exploration on Multicore Processors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Cache-aware lock-free queues for multiple producers/consumers and weak memory consistency
OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
Cruiser: concurrent heap buffer overflow monitoring using lock-free data structures
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
A GPU-based high-throughput image retrieval algorithm
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Efficient frequent item counting in multi-core hardware
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient unbounded lock-free queue for multi-core systems
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Understanding the performance of concurrent data structures on graphics processors
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Understanding parallelism in graph traversal on multi-core clusters
Computer Science - Research and Development
On-the-fly pipeline parallelism
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Deterministic scale-free pipeline parallelism with hyperqueues
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Load-balanced pipeline parallelism
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Well-structured futures and cache locality
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Accelerating sequential programs on commodity multi-core processors
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Low overhead core-to-core communication is critical for efficient pipeline-parallel software applications. This paper presents FastForward, a cache-optimized single-producer/single-consumer concurrent lock-free queue for pipeline parallelism on multicore architectures, with weak to strongly ordered consistency models. Enqueue and dequeue times on a 2.66 GHz Opteron 2218 based system are as low as 28.5 ns, up to 5x faster than the next best solution. FastForward's effectiveness is demonstrated for real applications by applying it to line-rate soft network processing on Gigabit Ethernet with general purpose commodity hardware.