Topics inference by weighted mutual information measures computed from structured corpus

  • Authors:
  • Harry Chang

  • Affiliations:
  • AT&T Labs-Research, Austin, TX

  • Venue:
  • NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel topic inference framework that is built on the scalability and adaptability of mutual information (MI) techniques. The framework is designed to systematically construct a more robust language model (LM) for topic-oriented search terms in the domain of electronic programming guide (EPG) for broadcast TV programs. The topic inference system identifies the most relevant topics implied from a search term, based on a simplified MI-based classifier trained from a highly structured XML-based text corpus, which is derived from continuously updated EPG data feeds. The proposed framework is evaluated against a set of EPG-specific queries from a large user population collected from a real world web-based IR system. The MI-base topic inference system is able to achieve 98 percent accuracy in recall measurement and 82 percent accuracy in precision measurement on the test set.