Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A reference ontology for biomedical informatics: the foundational model of anatomy
Journal of Biomedical Informatics - Special issue: Unified medical language system
Biomedical informatics and granularity: Conference Papers
Comparative and Functional Genomics
HLT '91 Proceedings of the workshop on Speech and Natural Language
On carcinomas and other pathological entities: Research Articles
Comparative and Functional Genomics
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
The GENIA corpus: an annotated research abstract corpus in molecular biology domain
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Introduction to the bio-entity recognition task at JNLPBA
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Desiderata for ontologies to be used in semantic annotation of biomedical documents
Journal of Biomedical Informatics
Bioinformatics
CoNLL-2011 shared task: modeling unrestricted coreference in OntoNotes
CONLL Shared Task '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
BRAT: a web-based tool for NLP-assisted text annotation
EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
Anatomical entities such as kidney, muscle and blood are central to much of biomedical scientific discourse, and the detection of mentions of anatomical entities is thus necessary for the automatic analysis of the structure of domain texts. Although a number of resources and methods addressing aspects of the task have been introduced, there have so far been no annotated corpora for training and evaluating systems for broad-coverage, open-domain anatomical entity mention detection. We introduce the AnEM corpus, a domain- and species-independent resource manually annotated for anatomical entity mentions using a fine-grained classification system. The corpus texts are selected randomly from citation abstracts and full-text papers with the aim of making the corpus representative of the entire available biomedical scientific literature. We demonstrate the use of the corpus through an evaluation of the broad-coverage MetaMap tagger and a CRF-based system trained on the corpus data, considering also a combination of these two methods. The combined system demonstrates a promising level of performance, approaching 80% F-score for mention detection for a relaxed matching criterion. The corpus and other introduced resources are available under open licences from http://www.nactem.ac.uk/anatomy/.