Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
FastForward for Efficient Pipeline Parallelism
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Speculative Decoupled Software Pipelining
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Compiler and hardware support for reducing the synchronization of speculative threads
ACM Transactions on Architecture and Code Optimization (TACO)
Analytical Modeling of Pipeline Parallelism
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Towards parallel execution of IEC 61131 industrial cyber-physical systems applications
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Modern automotive and aerospace embedded applications require very high-performance simulations that are able to produce new values every microsecond. Simulations must now rely on scalable performance of multi-core systems rather than faster clock frequencies. Novel parallelization techniques are needed to satisfy the industrial simulation demands that are essential for the development of safety-critical systems. Simulink formalism is the industrial de facto standard, but current state-of-the-art simulation and code generation techniques fail to fully exploit the parallelism in modern multi-core systems. However, closed-loop and dynamic system simulations are very difficult to parallelize because of the loop-carried dependencies. In this paper we introduce a novel skewed pipelining technique that overcomes these difficulties and allows loop-carried Simulink applications to be executed concurrently in multi-core systems. By delaying the forwarding of values for a few iterations, we can break some data dependencies and coarsen the granularity of programs. This improves the concurrency and reduces the high cost of inter-processor communication. Implementation studies to demonstrate the viability of our method on a commodity multi-core system with 2, 3, and 4 processors show a 1.72, 2.38, and 3.33 fold speedup over uniprocessor execution.