Signal boosting for translingual topic tracking: document expansion and n-best translation

Authors:
Gina-Anne Levow;Douglas W. Oard
Affiliations:
Department of Computer Science, University of Chicago;College of Information Studies and Institute for Advanced Computer Studies, University of Maryland, College Park, MD
Venue:
Topic detection and tracking
Year:
2002

Citing 5
Cited 4

A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Phrasal translation and query expansion techniques for cross-language information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Document expansion for speech retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Effects of Term Segmentation on Chinese/English Cross-Language Information Retrieval

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Should we translate the documents or the queries in cross-language information retrieval?

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Cross-Language Access to Recorded Speech in the MALACH Project

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Language-specific models in multilingual topic tracking

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Dictionary-based techniques for cross-language information retrieval

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Statistical source expansion for question answering

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The University of Maryland participated in the TDT-1999 topic tracking task. This chapter describes the system architecture, including source-dependent normalization, and then focuses on the cross-language case in which English training stories were used to find Mandarin stories on the same topic. Processes that may introduce noise, including errorful translation and transcription, are described and five techniques for minimizing the impact of a reduced signal-to-noise ratio are identified. Three techniques focus on signal boosting: augmenting story representations with topically related terminology through "document expansion," exploiting knowledge of alternative translations using balanced n-best term translation, and enriching the bilingual term list to improve translation coverage. The remaining two techniques focus on noise reduction: removing common "stopwords" before translation and using corpus statistics to guide translation selection. Two of the signal boosting strategies yielded substantial gains using techniques that can be ported to other languages fairly easily, while outperforming state-of-the-art general-purpose machine translation. By contrast, neither of the noise reduction strategies produced significant improvements.