The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Improving SMT quality with morpho-syntactic analysis
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Machine Translation with Inferred Stochastic Finite-State Transducers
Computational Linguistics
N-gram-based Machine Translation
Computational Linguistics
Improving statistical MT by coupling reordering and decoding
Machine Translation
Stochastic finite-state models for spoken language machine translation
EmbedMT '00 ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems
Morpho-syntactic information for automatic error analysis of statistical machine translation output
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
N-gram-based SMT system enhanced with reordering patterns
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Automatic translation error analysis
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Journal of the American Society for Information Science and Technology
A graphical interface for MT evaluation and error analysis
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Statistical machine translation enhancements through linguistic levels: A survey
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
This work aims to improve an N-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish---Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource.