Supporting server-level fault tolerance in concurrent-push-based parallel video servers

  • Authors:
  • J. Y.B. Lee

  • Affiliations:
  • Dept. of Inf. Eng., Chinese Univ. of Hong Kong, Shatin

  • Venue:
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallel video servers have been proposed for building large-scale video-on-demand (VoD) systems from multiple low-cost servers. However, when adding more servers to scale up the capacity, system-level reliability will decrease as failure of any one of the servers will cripple the entire system. To tackle this reliability problem, this paper proposes and analyzes architectures to support server-level fault tolerance in parallel video servers. Based on the concurrent push architecture proposed earlier, this paper tackles three problems pertaining to fault tolerance, namely redundancy management, redundant data transmission protocol, and real-time fault masking. First, redundant data based on erasure codes are introduced to video data stored in the servers, which are then delivered to the clients to support fault tolerance. Despite the success of distributed redundancy striping schemes such as RAID-5 in disk array implementations, we discover that similar schemes extended to the server context do not scale well. Instead, we propose a redundant server scheme that is both scalable, and with lower total server buffer requirement. Second, two protocols are proposed to manage the transmission of redundant data to the clients, namely forward erasure correction which always transmits redundant data, and on-demand correction which transmits redundant data only after a server failure is detected. Third, to enable ongoing video sessions to maintain nonstop video playback during failure, we propose using fault masking at the client to recompute lost video data in real-time. In particular we derive the amount of client buffer required so that nonstop, continuous video playback can be maintained despite server failures