Semantic concept detection for user-generated video content using a refined image folksonomy

  • Authors:
  • Hyun-seok Min;Sihyoung Lee;Wesley De Neve;Yong Man Ro

  • Affiliations:
  • Image and Video Systems Lab, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea;Image and Video Systems Lab, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea;Image and Video Systems Lab, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea;Image and Video Systems Lab, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

  • Venue:
  • MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The automatic detection of semantic concepts is a key technology for enabling efficient and effective video content management. Conventional techniques for semantic concept detection in video content still suffer from several interrelated issues: the semantic gap, the imbalanced data set problem, and a limited concept vocabulary size. In this paper, we propose to perform semantic concept detection for user-created video content using an image folksonomy in order to overcome the aforementioned problems. First, an image folksonomy contains a vast amount of user-contributed images. Second, a significant portion of these images has been manually annotated by users using a wide variety of tags. However, user-supplied annotations in an image folksonomy are often characterized by a high level of noise. Therefore, we also discuss a method that allows reducing the number of noisy tags in an image folksonomy. This tag refinement method makes use of tag co-occurrence statistics. To verify the effectiveness of the proposed video content annotation system, experiments were performed with user-created image and video content available on a number of social media applications. For the datasets used, video annotation with tag refinement has an average recall rate of 84% and an average precision of 75%, while video annotation without tag refinement shows an average recall rate of 78% and an average precision of 62%.