Discriminant ranking for efficient treebanking

Authors:
Yi Zhang;Valia Kordoni
Affiliations:
Saarland University;Saarland University
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 4
Cited 2

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
The LinGO Redwoods treebank motivation and preliminary applications

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Annotating wall street journal texts using a hand-crafted deep linguistic grammar

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop

A collaborative annotation between human annotators and a statistical parser

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Cross-Domain Effects on Parse Selection for Precision Grammars

Research on Language and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Treebank annotation is a labor-intensive and time-consuming task. In this paper, we show that a simple statistical ranking model can significantly improve treebanking efficiency by prompting human annotators, well-trained in disambiguation tasks for treebanking but not necessarily grammar experts, to the most relevant linguistic disambiguation decisions. Experiments were carried out to evaluate the impact of such techniques on annotation efficiency and quality. The detailed analysis of outputs from the ranking model shows strong correlation to the human annotator behavior. When integrated into the tree-banking environment, the model brings a significant annotation speed-up with improved inter-annotator agreement.