A month to topic detection and tracking in Hindi

  • Authors:
  • James Allan;Victor Lavrenko;Margaret E. Connell

  • Affiliations:
  • University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the one-month (June 2003) effort to create a topic detection and tracking (TDT) system to support news stories in Hindi. The University of Massachusetts submitted results for three different TDT tasks in the DARPA surprise language evaluation. The official task was topic tracking, but we also provided results for the new event detection and topic detection (clustering) tasks. Our approach to all three tasks was based on the vector-space model of information retrieval. We also describe the process we used to create the relevance judgments used to evaluate the system. Results suggest that topic tracking effectiveness is comparable to that of TDT tracking systems in other languages. Results for clustering and new event detection indicate that parameter settings for those tasks are sensitive to the language being used.