Who, what, when, where, why?: comparing multiple approaches to the cross-lingual 5W task

  • Authors:
  • Kristen Parton;Kathleen R. McKeown;Bob Coyne;Mona T. Diab;Ralph Grishman;Dilek Hakkani-Tür;Mary Harper;Heng Ji;Wei Yun Ma;Adam Meyers;Sara Stolbach;Ang Sun;Gokhan Tur;Wei Xu;Sibel Yaman

  • Affiliations:
  • Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY;New York University, New York, NY;International Computer Science Institute, Berkeley, CA;Human Lang. Tech. Ctr. of Excellence, Johns Hopkins and U. of Maryland, College Park;City University of New York, New York, NY;Columbia University, New York, NY;New York University, New York, NY;Columbia University, New York, NY;New York University, New York, NY;SRI International, Palo Alto, CA;New York University, New York, NY;International Computer Science Institute, Berkeley, CA

  • Venue:
  • ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). In this paper, we present an error analysis of a new cross-lingual task: the 5W task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. We analyze systems that we developed, identifying specific problems in language processing and MT that cause errors. The best cross-lingual 5W system was still 19% worse than the best monolingual 5W system, which shows that MT significantly degrades sentence-level understanding. Neither source-language nor target-language analysis was able to circumvent problems in MT, although each approach had advantages relative to the other. A detailed error analysis across multiple systems suggests directions for future research on the problem.