Consistency checking for Treebank alignment

Authors:
Markus Dickinson;Yvonne Samuelsson
Affiliations:
Indiana University;Stockholm University
Venue:
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Year:
2010

Citing 10
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
On the evaluation and comparison of taggers: the effect of noise in testing corpora

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Detecting errors in discontinuous structural annotation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Representations for category disambiguation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic generation of parallel treebanks

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Correcting a PoS-tagged corpus using three complementary methods

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
A search tool for parallel treebanks

LAW '07 Proceedings of the Linguistic Annotation Workshop
Data cleaning for word alignment

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores ways to detect errors in aligned corpora, using very little technology. In the first method, applicable to any aligned corpus, we consider alignment as a string-to-string mapping. Treating the target string as a label, we examine each source string to find inconsistencies in alignment. Despite setting up the problem on a par with grammatical annotation, we demonstrate crucial differences in sorting errors from legitimate variations. The second method examines phrase nodes which are predicted to be aligned, based on the alignment of their yields. Both methods are effective in complementary ways.