Parallel failure recovery techniques in cluster-based media servers

  • Authors:
  • Joahyung Lee;Inbum Jung

  • Affiliations:
  • Department of Computer Science and Engineering, Kangwon National University, Chuncheon, Korea;Department of Computer Science and Engineering, Kangwon National University, Chuncheon, Korea

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

For large-scale video-on-demand (VOD) service, cluster servers are highlighted due to their high performance and low cost. A cluster server consists of a front-end node and multiple backend nodes. Though the increase in backend nodes provides more quality of service (QoS) streams, the possibility of backend node failure is proportionally increased. The failure causes not only the cessation of streaming services but also the loss of current playing positions. In this paper, when a backend node fails, recovery mechanisms are studied to support the streaming service continuously. Without considering the characteristics of cluster-based servers and MPEG media, the basic redundant array of independent disks (RAID) techniques cause a network bottleneck in the internal network path and demonstrate inefficient CPU usage in backend nodes. To address these problems, a new failure recovery mechanism is proposed based on the pipeline computing concept. The proposed method not only distributes the internal network traffic generated from the recovery operations but also utilizes the CPU time available in the backend nodes. In the experiments, even if a backend node fails, the proposed method provides continuous streaming media services within a short MTTR value as well as more QoS streams than the existing method.