Error detection and error classification: failure awareness in data transfer scheduling

  • Authors:
  • Mehmet Balman;Tevfik Kosar

  • Affiliations:
  • Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.;Department of Computer Science, Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA

  • Venue:
  • International Journal of Autonomic Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data transfer in distributed environment is prone to frequent failures resulting from back-end system level problems, like connectivity failure which is technically untraceable by users. Error messages are not logged efficiently, and sometimes are not relevant/useful from users' point-of-view. Our study explores the possibility of efficient error detection and reporting system for such environments. Prior knowledge about the environment and awareness of the actual reason behind a failure would enable higher level planners to make better and accurate decisions. It is necessary to have well defined error detection and error reporting methods to increase the usability and serviceability of existing data transfer protocols and data management systems. We investigate the applicability of early error detection and error classification techniques and propose an error reporting framework and a failure-aware data transfer life cycle to improve arrangement of data transfer operations and to enhance decision making of data transfer schedulers.