A contextual analysis of the YouTube duplicate content

  • Authors:
  • Tiago Rodrigues;Fabrício Benevenuto;Virgílio Almeida;Jussara Almeida;Marcos Gonçalves

  • Affiliations:
  • UFMG, Belo Horizonte/Brasil;UFMG, Belo Horizonte/Brasil;UFMG, Belo Horizonte/Brasil;UFMG, Belo Horizonte/Brasil;UFMG, Belo Horizonte/Brasil

  • Venue:
  • WebMedia '09 Proceedings of the XV Brazilian Symposium on Multimedia and the Web
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Videos have become a predominant part of users' daily lives on the Web, especially with the emergence of online video social networks such as YouTube. Since users can independently share videos in these systems, some videos can be duplicates (i.e., identical or very similar videos). Despite having the same content, there are some potential differences in duplicates, for example, in their associated metadata (i.e., tags, title) and their popularity scores (i.e., number of views, comments). Quantifying these differences is important for three reasons. The first is related to the necessity of understanding how users associate metadata to videos on YouTube, which is crucial for video information retrieval mechanisms and recommendation systems. The second is associated with understanding possible reasons that influence on the popularity of videos, essential to the association of advertisements to videos and performance issues related to the use of caches and CDNs. The third comes from the necessity to detect opportunistic actions, which pollute and compromise the use of the system. This work presents a wide characterization of the differences among identical contents in online video sharing systems. Using a large video sample collected from YouTube, we construct a data set of duplicates. Besides quantifying contextual differences among duplicates, our results also reveal the presence of suspect behavior in the creation and association of metadata to videos.