Creating algorithms for parsers and taggers for resource-poor languages using a related resource-rich language

  • Authors:
  • Eugene Charniak;Dmitriy Genzel

  • Affiliations:
  • Brown University;Brown University

  • Venue:
  • Creating algorithms for parsers and taggers for resource-poor languages using a related resource-rich language
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern statistical natural language processing techniques require large amounts of human-annotated data to work well. For practical reasons, the required amount of data exists only for a few languages of major interest. In my work I show how a resource-rich language can be leveraged to produce the necessary resources and tools for related resource-poor languages. The work consists of two parts. The first part focuses on building a word-to-word translation model from parallel corpora. This involved a variety of methods, some well-known and some new. The new methods focus on exploiting lexical and syntactic similarities of the languages. The second part utilized the word-to-word model created in the first part, to first assign parts of speech and then parse the text in several related resource-poor languages.