Wikipedia based news video topic modeling for information extraction

  • Authors:
  • Sujoy Roy;Mun-Thye Mak;Kong Wah Wan

  • Affiliations:
  • Institute for Infocomm Research, A*STAR, Singapore;Institute for Infocomm Research, A*STAR, Singapore;Institute for Infocomm Research, A*STAR, Singapore

  • Venue:
  • MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Determining the topic of a news video story (NVS) from its audio-visual footage is an important part of meta-data generation. In this paper we propose a news story topic modeling approach that takes advantage of online knowledge resources like Wikipedia to model the topic of a news story. A NVS is modeled as a distribution over several Wikipedia pages related to the story. The mapping of the NVS to a Wikipedia page table-of-contents (TOC) is also determined. The specific advantages of this topic modeling approach are. (1) The topic is interpretable as a weighted distribution over a set of semantically meaningful story title phrases instead of just being a collection of words. (2) It facilitates organizing news video stories as a taxonomy that captures several perspectives to the story. (3) The taxonomy facilitates exploration and non-linear search. Performance evaluations from an information extraction perspective validate the efficacy of the proposed topic modeling approach compared to TF-IDF and LDA based approaches on a large news video corpus.