Ensemble multi-instance multi-label learning approach for video annotation task

  • Authors:
  • Xin-Shun Xu;Xiangyang Xue;Zhi-Hua Zhou

  • Affiliations:
  • Nanjing University & Shandong University , Nanjing, China;Fudan University, Shanghai, China;Nanjing University, Nanjing, China

  • Venue:
  • MM '11 Proceedings of the 19th ACM international conference on Multimedia
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic video annotation is an important ingredient for video indexing, browsing, and retrieval. Traditional studies represent one video clip with a flat feature vector; however, video data usually has natural structure. Moreover, a video clip is generally relevant to multiple concepts. Indeed, the video annotation task is inherently a Multi-Instance Multi-Label (MIML) learning problem. In this paper, we propose the En-MIMLSVM approach for the video annotation task. It considers the class imbalance and long time training problems of most video annotation tasks. In addition, a temporally consistent weighted multi-instance kernel is developed to take into account both the temporal consistency in video data and the significance of instances of different levels in pyramid representation. The En-MIMLSVM is evaluated on TRECVID 2005 data set, and the results show that it outperforms several state-of-the-art methods.