Separating Performance Anomalies from Workload-Explained Failures in Streaming Servers

  • Authors:
  • Carlos Augusto Cunha;Luis Moura e Silva

  • Affiliations:
  • -;-

  • Venue:
  • CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Video-streaming services are dominating the Internet, delivering content for video-on-demand, TV, education and collaborative work. Service parameters addressing quality and continuity of video content have a special importance due to the human sensitiveness to variations on video quality and decades of quality patterns absorbed by traditional TV users. Thus, the performance analysis and repair lifecycle at server and network levels is mandatory to avoid degradation of user experience. At the network level, there are several effective techniques based on temporal and spatial data redundancy, though they deeply depend on healthy servers with enough resources to afford both the client and recovery workloads. Excess of streaming workloads and performance anomalies (i.e., server resources exhaustion not explained by client requests) are typical causes of server performance failures. The former is often caused by memory caching of popular videos, which impacts the number of requests accepted by the server and consequently blurs load admittance mechanisms when the workload changes. The latter is caused by server internal factors independent of client workloads (e.g., memory leaks and maintenance activities). Separating client workload related failures from performance anomalies is mandatory for selection of immediate repair actions, capacity planning and to support fault repair. We evaluated the performance of Naive Bayes and C4.5 Trees algorithms for classification of these failure states using client and server performance metrics. Results shown that it is possible to predict the type of failure with levels of recall and accuracy higher than 90% for workload types with different popularity levels.