A feedback-driven proportion allocator for real-rate scheduling
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Using Cohort Scheduling to Enhance Server Performance (Extended Abstract)
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Towards the automatic optimal mapping of pipeline algorithms
Parallel Computing
Performance Evaluation of a Parallel Pipeline Computational Model for Space-Time Adaptive Processing
The Journal of Supercomputing
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallel-stage decoupled software pipelining
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Load balancing using work-stealing for pipeline parallelism in emerging applications
Proceedings of the 23rd international conference on Supercomputing
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Analytical Modeling of Pipeline Parallelism
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parallel pattern detection for architectural improvements
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Petri-nets as an intermediate representation for heterogeneous architectures
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Bottleneck identification and scheduling in multithreaded applications
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Parcae: a system for flexible parallel execution
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
A template library to integrate thread scheduling and locality management for NUMA multiprocessors
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
High-level support for pipeline parallelism on many-core architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Utility-based acceleration of multithreaded applications on asymmetric CMPs
Proceedings of the 40th Annual International Symposium on Computer Architecture
On-the-fly pipeline parallelism
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive online scheduling in storm
Proceedings of the 7th ACM international conference on Distributed event-based systems
Deterministic scale-free pipeline parallelism with hyperqueues
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Load-balanced pipeline parallelism
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Hi-index | 0.00 |
Extracting high performance from Chip Multiprocessors requires that the application be parallelized. A common software technique to parallelize loops is pipeline parallelism in which the programmer/compiler splits each loop iteration into stages and each stage runs on a certain number of cores. It is important to choose the number of cores for each stage carefully because the core-to-stage allocation determines performance and power consumption. Finding the best core-to-stage allocation for an application is challenging because the number of possible allocations is large, and the best allocation depends on the input set and machine configuration. This paper proposes Feedback-Directed Pipelining (FDP), a software framework that chooses the core-to-stage allocation at run-time. FDP first maximizes the performance of the workload and then saves power by reducing the number of active cores, without impacting performance. Our evaluation on a real SMP system with two Core2Quad processors (8 cores) shows that FDP provides an average speedup of 4.2x which is significantly higher than the 2.3x speedup obtained with a practical profile-based allocation. We also show that FDP is robust to changes in machine configuration and input set.