Recovering from distributable thread failures in distributed real-time Java

  • Authors:
  • Edward Curley;Binoy Ravindran;Jonathan Anderson;E. Douglas Jensen

  • Affiliations:
  • Virginia Tech, Blacksburg, Blacksburg, VA;Virginia Tech, Blacksburg, Blacksburg, VA;Virginia Tech, Blacksburg, Blacksburg, VA;The MITRE Corporation, Bedford, MA

  • Venue:
  • ACM Transactions on Embedded Computing Systems (TECS)
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of recovering from the failures of distributable threads (“threads”) in distributed real-time systems that operate under runtime uncertainties including those on thread execution times, thread arrivals, and node failure occurrences. When a thread experiences a node failure, the result is a broken thread having an orphan. Under a termination model, the orphans must be detected and aborted, and exceptions must be delivered to the farthest, contiguous surviving thread segment for resuming thread execution. Our application/scheduling model includes the proposed distributable thread programming model for the emerging Distributed Real-Time Specification for Java (DRTSJ), together with an exception-handler model. Threads are subject to time/utility function (TUF) time constraints and an utility accrual (UA) optimality criterion. A key underpinning of the TUF/UA scheduling paradigm is the notion of “best-effort” where higher importance threads are always favored over lower importance ones, irrespective of thread urgency as specified by their time constraints. We present a thread scheduling algorithm called HUA and a thread integrity protocol called TPR. We show that HUA and TPR bound the orphan cleanup and recovery time with bounded loss of the best-effort property. Our implementation experience for HUA/TPR in the Reference Implementation of the proposed programming model for the DRTSJ demonstrates the algorithm/protocol's effectiveness.