Creating algorithms for parsers and taggers for resource-poor languages using a related resource-rich language

Authors:
Eugene Charniak;Dmitriy Genzel
Affiliations:
Brown University;Brown University
Venue:
Creating algorithms for parsers and taggers for resource-poor languages using a related resource-rich language
Year:
2006

Citing 0
Cited 1

Social (distributed) language modeling, clustering and dialectometry

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern statistical natural language processing techniques require large amounts of human-annotated data to work well. For practical reasons, the required amount of data exists only for a few languages of major interest. In my work I show how a resource-rich language can be leveraged to produce the necessary resources and tools for related resource-poor languages. The work consists of two parts. The first part focuses on building a word-to-word translation model from parallel corpora. This involved a variety of methods, some well-known and some new. The new methods focus on exploiting lexical and syntactic similarities of the languages. The second part utilized the word-to-word model created in the first part, to first assign parts of speech and then parse the text in several related resource-poor languages.