An automatic method for generating sense tagged corpora
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Numerical Recipes in C++: the art of scientific computing
Numerical Recipes in C++: the art of scientific computing
A systematic comparison of various statistical alignment models
Computational Linguistics
A statistical approach to language translation
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Precision and recall of machine translation
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
HLT '02 Proceedings of the second international conference on Human Language Technology Research
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
This paper studies the enrichment of Spanish WordNet with synset glosses automatically obtained from the English Word-Net glosses using a phrase-based Statistical Machine Translation system. We construct the English-Spanish translation system from a parallel corpus of proceedings of the European Parliament, and study how to adapt statistical models to the domain of dictionary definitions. We build specialized language and translation models from a small set of parallel definitions and experiment with robust manners to combine them. A statistically significant increase in performance is obtained. The best system is finally used to generate a definition for all Spanish synsets, which are currently ready for a manual revision. As a complementary issue, we analyze the impact of the amount of in-domain data needed to improve a system trained entirely on out-of-domain data.