Video encoding and transcoding using machine learning

  • Authors:
  • Gerardo Fernandez Escribano;Rashid Jillani;Christopher Holder;Hari Kalva;Jose Luis Martinez Martinez;Pedro Cuenca

  • Affiliations:
  • Universidad de Castilla-La Mancha, Albacete, Spain;Florida Atlantic University, Boca Raton, FL;Florida Atlantic University, Boca Raton, FL;Florida Atlantic University, Boca Raton, FL;Universidad de Castilla-La Mancha, Albacete, Spain;Universidad de Castilla-La Mancha, Albacete, Spain

  • Venue:
  • Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Machine learning has been widely used in video analysis and search applications. In this paper, we describe a non-traditional use of machine learning in video processing - video encoding and transcoding. Video encoding and transcoding are computationally intensive processes and this complexity is increasing significantly with new compression standards such as H.264. Video encoders and transcoders have to manage the quality vs. complexity tradeoff carefully. Optimal encoding is prohibitively complex and sub-optimal coding decisions are usually used to reduce complexity but also sacrifices quality. Resource constrained devices cannot use all the advanced coding tools offered by the standards due to computational needs. We show that machine learning can be used to reduce the computational complexity of video coding and transcoding problems without significant loss in quality. We have developed the use of machine learning in video coding and transcoding and have evaluated it on several encoding and transcoding problems. We describe the general ideas in the application of machine learning and present the details of four different problems: 1) MPEG-2 to H.264 video transcoding, 2) H.263 to VP6 transcoding, 3) H.264 encoding and 4) Distributed Video Coding (DVC). Our results show that use of machine learning significantly reduces the complexity of encoders/transcoders and enables efficient video encoding on resource constrained devices such as mobile devices and video sensors.