Signal boosting for translingual topic tracking: document expansion and n-best translation

  • Authors:
  • Gina-Anne Levow;Douglas W. Oard

  • Affiliations:
  • Department of Computer Science, University of Chicago;College of Information Studies and Institute for Advanced Computer Studies, University of Maryland, College Park, MD

  • Venue:
  • Topic detection and tracking
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The University of Maryland participated in the TDT-1999 topic tracking task. This chapter describes the system architecture, including source-dependent normalization, and then focuses on the cross-language case in which English training stories were used to find Mandarin stories on the same topic. Processes that may introduce noise, including errorful translation and transcription, are described and five techniques for minimizing the impact of a reduced signal-to-noise ratio are identified. Three techniques focus on signal boosting: augmenting story representations with topically related terminology through "document expansion," exploiting knowledge of alternative translations using balanced n-best term translation, and enriching the bilingual term list to improve translation coverage. The remaining two techniques focus on noise reduction: removing common "stopwords" before translation and using corpus statistics to guide translation selection. Two of the signal boosting strategies yielded substantial gains using techniques that can be ported to other languages fairly easily, while outperforming state-of-the-art general-purpose machine translation. By contrast, neither of the noise reduction strategies produced significant improvements.