Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Automatic rule induction for unknown-word guessing
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Domain-specific language models and lexicons for tagging
Journal of Biomedical Informatics
A token centric part-of-speech tagger for biomedical text
AIME'11 Proceedings of the 13th conference on Artificial intelligence in medicine
Hi-index | 0.00 |
This paper presents a project whose main goal is to construct a corpus of clinical text manually annotated for part-of-speech information. We describe and discuss the process of training three domain experts to perform linguistic annotation. We list some of the challenges as well as encouraging results pertaining to inter-rater agreement and consistency of annotation. We also present preliminary experimental results indicating the necessity for adapting state-of-the-art POS taggers to the sublanguage domain of medical text.