Top-down cues for event recognition

  • Authors:
  • Li Li;Chunfeng Yuan;Weiming Hu;Bing Li

  • Affiliations:
  • Institute of Automation, Chinese Academy of Sciences and Radio, Film and Television Design and Research Institute;Institute of Automation, Chinese Academy of Sciences;Institute of Automation, Chinese Academy of Sciences;Institute of Automation, Chinese Academy of Sciences

  • Venue:
  • ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

How to fuse static and dynamic information is a key issue in event analysis. In this paper, we present a novel approach to combine appearance and motion information together through a top-down manner for event recognition in real videos. Unlike the conventional bottom-up way, attention can be focused volitionally on top-down signals derived from task demands. A video is represented by a collection of spatio-temporal features, called video words by quantizing the extracted spatio-temporal interest points (STIPs) from the video. We propose two approaches to build class specific visual or motion histograms for the corresponding features. One is using the probability of a class given a visual or motion word. High probability means more attention should be paid to this word. Moreover, in order to incorporate the negative information for each word, we propose to utilize the mutual information between each word and event label. High mutual information means high relevance between this word and the class label. Both methods not only can characterize two aspects of an event, but also can select the relevant words, which are all discriminative to the corresponding event. Experimental results on the TRECVID 2005 and the HOHA video corpus demonstrate that the mean average precision has been improved by using the proposed method.