Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Facilitating treebank annotation using a statistical parser
HLT '01 Proceedings of the first international conference on Human language technology research
Building a large-scale annotated Chinese corpus
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Corrective feedback and persistent learning for information extraction
Artificial Intelligence
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
A semi-automatic method for annotating a biomedical proposition bank
LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Language Resources and Evaluation
Hi-index | 0.00 |
We investigate a way to partially automate corpus annotation for named entity recognition, by requiring only binary decisions from an annotator. Our approach is based on a linear sequence model trained using a k-best MIRA learning algorithm. We ask an annotator to decide whether each mention produced by a high recall tagger is a true mention or a false positive. We conclude that our approach can reduce the effort of extending a seed training corpus by up to 58%.