A systematic comparison of various statistical alignment models
Computational Linguistics
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
On the evaluation and comparison of taggers: the effect of noise in testing corpora
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Detecting errors in discontinuous structural annotation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Representations for category disambiguation
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic generation of parallel treebanks
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Correcting a PoS-tagged corpus using three complementary methods
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
A search tool for parallel treebanks
LAW '07 Proceedings of the Linguistic Annotation Workshop
Data cleaning for word alignment
ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Hi-index | 0.00 |
This paper explores ways to detect errors in aligned corpora, using very little technology. In the first method, applicable to any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to find inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucial differences in sorting errors from legitimate variations. The second method examines phrase nodes which are predicted to be aligned, based on the alignment of their yields. Both methods are effective in complementary ways.