A consistent checkpointing-recovery protocol for minimal number of nodes in mobile computing system

  • Authors:
  • Chandreyee Chowdhury;Sarmistha Neogy

  • Affiliations:
  • Department of Computer Science and Engineering, Jadavpur University, India;Department of Computer Science and Engineering, Jadavpur University, India

  • Venue:
  • HiPC'07 Proceedings of the 14th international conference on High performance computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The vast computing potential of mobile computing systems is often hampered by their susceptibility to transient and independent failures. To add reliability and high availability to such systems, checkpoint based rollback recovery is one of the widely used ones for scientific computing, database, telecommunication and mission critical applications. This paper presents a coordinated nonblocking checkpointing and recovery technique for such systems that handles the constraints posed by the underlying wireless network, efficiently. Here an initiator (an MSS) sends checkpoint requests to all other MSSs and the MSSs send this request only to those MHs, which have communicated in the last checkpointing interval (relieving the wireless network from synchronization overhead). Also all acknowledged messages are logged at the home station of the receiver MH so that only the faulty MHs need to recover in case of failure and no other process is affected by this fault and subsequent recovery.