Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
Distributed discrete-event simulation
ACM Computing Surveys (CSUR)
Queueing Model Based Network Server Performance Control
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
On Designing Improved Controllers for AQM Routers Supporting TCP Flows
On Designing Improved Controllers for AQM Routers Supporting TCP Flows
Multi-Criteria Scheduling of Pipeline Workflows (and Application To the JPEG Encoder)
International Journal of High Performance Computing Applications
Real-Time Systems and Programming Languages: Ada, Real-Time Java and C/Real-Time POSIX
Real-Time Systems and Programming Languages: Ada, Real-Time Java and C/Real-Time POSIX
Hi-index | 0.00 |
Nuclear disaster early warning systems are based on simulations of the atmospheric dispersion of the radioactive pollutants that may have been released into the atmosphere as a result of an accident at a nuclear power plant. Currently the calculation is performed by a series of 9 enchained FORTRAN and C/C++ sequential simulation codes. The new requirements to our example early warning system we focus on in this paper include a maximum response time of 120 seconds whereas currently computing a single simulation step exceeds this limit. For the purpose of improving performance we propose a pipeline parallelization of the simulation workflow on a multi-core system. This leads to a 4.5x speedup with respect to the sequential execution time on a dual quadcore machine. The scheduling problem which arises is that of maximizing the number of iterations of the dispersion calculation algorithm while not exceeding the maximum response time limit. In the context of our example application, a static scheduling strategy (e.g., a fixed rate of firing iterations) proves to be inappropriate because it is not able to tolerate faults that may occur during regular use (e.g., CPU failure, software errors, heavy load bursts). In this paper we show how a simple PI-controller is able to keep the realized response time of the workflow around a desired value in different failure and heavy load scenarios by automatically reducing the throughput of the system when necessary, thus improving the system's fault tolerance.