Adaptive Timeout Discovery Using the Network Weather Service
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Hi-index | 0.00 |
Failure identification is a fundamental operation concerning exceptional conditions that network programs must be able to perform. In this paper, we explore the use of timeouts to perform failure identification at the application level. We evaluate the use of static timeouts, and of dynamic timeouts based on forecasts using the Network Weather Service. For this evaluation, we perform experiments on a wide-area collection of 31 machines distributed in eight institions. Though the conclusions are limited to the collection of machines used, we observe that a single static timeout is not reasonable, even for a collection of similar machines over time. Dynamic timeouts perform roughly as well as the best static timeouts, and more importantly, they provide a single methodology for timeout determination that should be effective for wide-area applications.