Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Generating parallel code from object oriented mathematical models
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Reuse of software in distributed embedded automotive systems
Proceedings of the 4th ACM international conference on Embedded software
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Translating discrete-time simulink to lustre
ACM Transactions on Embedded Computing Systems (TECS)
Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
FastForward for Efficient Pipeline Parallelism
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Speculative Decoupled Software Pipelining
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Buffer optimization in multitask implementations of Simulink models
ACM Transactions on Embedded Computing Systems (TECS)
Multi-rate real-time simulation techniques
Proceedings of the 2007 Summer Computer Simulation Conference
Compiler and hardware support for reducing the synchronization of speculative threads
ACM Transactions on Architecture and Code Optimization (TACO)
ACM SIGARCH Computer Architecture News
Can PDES scale in environments with heterogeneous delays?
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Hi-index | 0.00 |
The parallelization of Simulink applications is currently a responsibility of the system designer and the superscalar execution of the processors. State-of-the-art Simulink compilers excel at producing reliable and production-quality embedded code, but fail to exploit the natural concurrency available in the programs and to effectively use modern multi-core architectures. The reason may be that many Simulink applications are replete with loop-carried dependencies that inhibit most parallel computing techniques and compiler transformations. In this paper, we introduce the concept of strands that allow the data dependencies to be broken while preserving the original semantics of the Simulink program. Our fully automatic compiler transformations create a concurrent representation of the program, and thread-level parallelism for multi-core systems is planned and orchestrated. To improve single processor performance, we also exploit fine grain (equation-level) parallelism by level-order scheduling inside each thread. Our strand transformation has been implemented as an automatic transformation in a proprietary compiler and with a realistic aeronautic model executed in two processors leads to an up to 1.98 times speedup over uniprocessor execution, while the existing manual parallelization method achieves a 1.75 times speedup.