Using 'smart' bilingual projection to feature-tag a monolingual dictionary

Authors:
Katharina Probst
Affiliations:
Carnegie Mellon University, Pittsburgh, PA
Venue:
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Year:
2003

Citing 10
Cited 1

A statistical approach to machine translation

Computational Linguistics
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Automatic Rule Learning for Resource-Limited MT

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Unsupervised learning of the morphology of a natural language

Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A decoder for syntax-based statistical MT

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of inflectional morphologies

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies

Induction of fine-grained part-of-speech taggers via classifier combination and crosslingual projection

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an approach to tagging a monolingual dictionary with linguistic features. In particular, we annotate the dictionary entries with parts of speech, number, and tense information. The algorithm uses a bilingual corpus as well as a statistical lexicon to find candidate training examples for specific feature values (e.g. plural). Then a similarity measure in the space defined by the training data serves to define a classifier for unseen data. We report evaluation results for a French dictionary, while the approach is general enough to be applied to any language pair.In a further step, we show that the proposed framework can be used to assign linguistic roles to extracted morphemes, e.g. noun plural markers. While the morphemes can be extracted using any algorithm, we present a simple algorithm for doing so. The emphasis hereby is not on the algorithm itself, but on the power of the framework to assign roles, which are ultimately indispensable for tasks such as Machine Translation.