Interlingual annotation of parallel text corpora: A new framework for annotation and evaluation

  • Authors:
  • Bonnie j. Dorr;Rebecca j. Passonneau;David Farwell;Rebecca Green;Nizar Habash;Stephen Helmreich;Eduard Hovy;Lori Levin;Keith j. Miller;Teruko Mitamura;Owen Rambow;Advaith Siddharthan

  • Affiliations:
  • Institute for advanced computer studies, university of maryland, avw williams building 3153, college park, md 20742, usa e-mail: bonnie@umiacs.umd.edu;Center for computational learning systems, columbia university, 475 riverside drive mc 7717, new york, ny 10115, usa e-mails: becky@cs.columbia.edu, habash@cs.columbia.edu, rambow@cs.columbia.edu;Computing research laboratory, new mexico state university, las cruces, nm 88001, usa e-mails: david@crl.nmsu.edu, shelmrei@crl.nmsu.edu;Oclc online computer library center, inc., 6565 kilgour place, dublin, oh 43017-3395, usa e-mail: greenre@oclc.org;Center for computational learning systems, columbia university, 475 riverside drive mc 7717, new york, ny 10115, usa e-mails: becky@cs.columbia.edu, habash@cs.columbia.edu, rambow@cs.columbia.edu;Computing research laboratory, new mexico state university, las cruces, nm 88001, usa e-mails: david@crl.nmsu.edu, shelmrei@crl.nmsu.edu;Information sciences institute, university of southern california, marina del rey, ca 90292, usa e-mail: hovy@isi.edu;Language technologies institute, carnegie mellon university, 5000 forbes ave., pittsburgh, pa 15213-3890, usa e-mails: lsl@cs.cmu.edu, teruko@cs.cmu.edu;The mitre corporation, 7515 colshire drive, mc lean, va 22102-7539, usa e-mail: freeder@mitre.org, keith@mitre.org;Language technologies institute, carnegie mellon university, 5000 forbes ave., pittsburgh, pa 15213-3890, usa e-mails: lsl@cs.cmu.edu, teruko@cs.cmu.edu;Center for computational learning systems, columbia university, 475 riverside drive mc 7717, new york, ny 10115, usa e-mails: becky@cs.columbia.edu, habash@cs.columbia.edu, rambow@cs.columbia.edu;Department of computing science, university of aberdeen, aberdeen, ab24 3ue, scotland, uk e-mail: advaith@abdn.ac.uk

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper focuses on an important step in the creation of a system of meaning representation and the development of semantically annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to annotate multiple translations of foreign-language texts with interlingual content. Three levels of representation are introduced: deep syntactic dependencies (IL0), intermediate semantic representations (IL1), and a normalized representation that unifies conversives, nonliteral language, and paraphrase (IL2). The resulting annotated, multilingually induced, parallel corpora will be useful as an empirical basis for a wide range of research, including the development and evaluation of interlingual NLP systems and paraphrase-extraction systems as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.