Recognizing human-human interaction activities using visual and textual information

  • Authors:
  • Sunyoung Cho;Sooyeong Kwak;Hyeran Byun

  • Affiliations:
  • Department of Computer Science, Yonsei University, Shinchon-Dong, Seodaemun-Gu, Seoul 120-749, Republic of Korea;Department of Electronic and Control Engineering, Hanbat National University, 125, Dongseo-daero, Yuseong-Gu, Daejeon 305-719, Republic of Korea;Department of Computer Science, Yonsei University, Shinchon-Dong, Seodaemun-Gu, Seoul 120-749, Republic of Korea

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.10

Visualization

Abstract

We exploit textual information for recognizing human-human interaction activities in YouTube videos. YouTube videos are generally accompanied by various types of textual information, such as title, description, and tags. In particular, since some of the tags describe the visual content of the video, making good use of tags can aid activity recognition in the video. The proposed method uses two-fold information for activity recognition: (i) visual information: correlations among activities, human poses, configurations of human body parts, and image features extracted from visual content and (ii) textual information: correlations with activities extracted from tags. For tag analysis we discover a set of relevant tags and extract the meaningful words. Correlations between words and activities are learned from expanded tags obtained from tags of related videos. We develop a model that jointly captures two-fold information for activity recognition. We consider the model as a structured learning task with latent variables, and estimate the parameters of the model by using a non-convex minimization procedure. The proposed approach is evaluated using a dataset that consists of highly challenging real world videos and their assigned tags collected from YouTube. Experimental results demonstrate that by exploiting the visual and textual information in a structured framework, the proposed method can significantly improve the activity recognition results.