The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
Proceedings of the International Symposium on Code Generation and Optimization
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Software-Pipelining on Multi-Core Architectures
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Execution-time Prediction for Dynamic Streaming Applications with Task-level Parallelism
DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Parametric throughput analysis of synchronous data flow graphs
Proceedings of the conference on Design, automation and test in Europe
Tupni: automatic reverse engineering of input formats
Proceedings of the 15th ACM conference on Computer and communications security
Partitioning streaming parallelism for multi-cores: a machine learning based approach
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
AARTS: low overhead online adaptive auto-tuning
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Adaptive parallel approximate similarity search for responsive multimedia retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Using machine learning to partition streaming programs
ACM Transactions on Architecture and Code Optimization (TACO)
Mantis: automatic performance prediction for smartphone applications
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions
Proceedings of the Eleventh ACM International Conference on Embedded Software
Hi-index | 0.00 |
Streaming applications are promising targets for effectively utilizing multicores because of their inherent amenability to pipelined parallelism. While existing methods of orchestrating streaming programs on multicores have mostly been static, real-world applications show ample variations in execution time that may cause the achieved speedup and throughput to be sub-optimal. One of the principle challenges for moving towards dynamic orchestration has been the lack of approaches that can predict or accurately estimate upcoming dynamic variations in execution efficiently, well before they occur. In this paper, we propose an automated dynamic execution behavior prediction approach that can be used to efficiently estimate the time that will be spent in different pipeline stages for upcoming inputs without requiring program execution. This enables dynamic balancing or scheduling of execution to achieve better speedup. Our approach first uses dynamic taint analysis to automatically generates an input-based execution characterization of the streaming program, which identifies the key control points where variation in execution might occur with the associated input elements that cause these variations.We then automatically generate a light-weight emulator from the program using this characterization that can simulate the execution paths taken for new streaming inputs and provide an estimate of execution time that will be spent in processing these inputs, enabling prediction of possible dynamic variations. We present experimental evidence that our technique can accurately and efficiently estimate execution behaviors for several benchmarks. Our experiments show that dynamic orchestration using our predicted execution behavior can achieve considerably higher speedup than static orchestration.