Automatic on-line failure diagnosis at the end-user site

  • Authors:
  • Joseph Tucek;Shan Lu;Chengdu Huang;Spiros Xanthos;Yuanyuan Zhou

  • Affiliations:
  • Department of Computer Science, University of Illinois at Urbana Champaign, IL;Department of Computer Science, University of Illinois at Urbana Champaign, IL;Department of Computer Science, University of Illinois at Urbana Champaign, IL;Department of Computer Science, University of Illinois at Urbana Champaign, IL;Department of Computer Science, University of Illinois at Urbana Champaign, IL

  • Venue:
  • HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Production run software failures cause endless grief to end-users, and endless challenges to programmers as they commonly have incomplete information about the bug, facing great hurdles to reproduce it. Users are often unable or unwilling to provide diagnostic information due to technical challenges and privacy concerns; even if the information is available, failure analysis is time-consuming. We propose performing initial diagnosis automatically and at the end user's site. The moment of failure is a valuable commodity programmers strive to reproduce-- leveraging it directly reduces diagnosis effort while simultaneously addressing privacy concerns. Additionally, we propose a failure diagnosis protocol. So far as we know, this is the first such automatic protocol proposed for on-line diagnosis. By mimicking the steps a human programmer follows dissecting a failure, we deduce important failure information. Beyond on-line use, this can also reduce the effort of in-house testing. We implement some of these ideas. Using lightweight checkpoint and rollback techniques and dynamic, run-time software analysis tools, we initiate the automatic diagnosis of several bugs. Our preliminary results show that automatic diagnosis can efficiently and accurately find likely root causes and fault propagation chains. Further, normal execution overhead is only 2%.