Feedback-directed pipeline parallelism
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Partitioning streaming parallelism for multi-cores: a machine learning based approach
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Skewed pipelining for parallel simulink simulations
Proceedings of the Conference on Design, Automation and Test in Europe
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Expressing pipeline parallelism using TBB constructs: a case study on what works and what doesn't
Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Work stealing strategies for parallel stream processing in soft real-time systems
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
A template library to integrate thread scheduling and locality management for NUMA multiprocessors
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Pipelining for cyclic control systems
Proceedings of the 16th international conference on Hybrid systems: computation and control
On-the-fly pipeline parallelism
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Deterministic scale-free pipeline parallelism with hyperqueues
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Load-balanced pipeline parallelism
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using machine learning to partition streaming programs
ACM Transactions on Architecture and Code Optimization (TACO)
PAIS: Parallelism-aware interconnect scheduling in multicores
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
Parallel programming is a requirement in the multi-core era. One of the most promising techniques to make parallel programming available for the general users is the use of parallel programming patterns. Functional pipeline parallelism is a pattern that is well suited for many emerging applications, such as streaming and "Recognition, Mining and Synthesis" (RMS) workloads. In this paper we develop an analytical model for pipeline parallelism based on queueing theory. The model is useful to both characterize the performance and efficiency of existing implementations and to guide the design of new pipeline algorithms. We demonstrate the usefulness of the model by characterizing and optimizing two of the PARSEC benchmarks, ferret and dedup. We identified two issues with these codes: load imbalance and I/O bottlenecks. We addressed load imbalance using two techniques: i) parallel pipeline stage collapsing; and ii) dynamic scheduling. We implemented these optimizations using Pthreads and the Threading Building Blocks (TBB) libraries. We compare the performance of different alternatives and we note that the TBB implementation based on work stealing outperforms all other variants.