Open-domain anatomical entity mention detection

Authors:
Tomoko Ohta;Sampo Pyysalo;Jun'ichi Tsujii;Sophia Ananiadou
Affiliations:
National Centre for Text Mining and University of Manchester, Manchester, UK;National Centre for Text Mining and University of Manchester, Manchester, UK;Microsoft Research Asia, Beijing, China;National Centre for Text Mining and University of Manchester, Manchester, UK
Venue:
ACL '12 Proceedings of the Workshop on Detecting Structure in Scholarly Discourse
Year:
2012

Citing 13
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A reference ontology for biomedical informatics: the foundational model of anatomy

Journal of Biomedical Informatics - Special issue: Unified medical language system
Biomedical informatics and granularity: Conference Papers

Comparative and Functional Genomics
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language
On carcinomas and other pathological entities: Research Articles

Comparative and Functional Genomics
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
The GENIA corpus: an annotated research abstract corpus in molecular biology domain

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Introduction to the bio-entity recognition task at JNLPBA

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
An exploration of mining gene expression mentions and their anatomical locations from biomedical text

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Desiderata for ontologies to be used in semantic annotation of biomedical documents

Journal of Biomedical Informatics
OrganismTagger

Bioinformatics
CoNLL-2011 shared task: modeling unrestricted coreference in OntoNotes

CONLL Shared Task '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task
BRAT: a web-based tool for NLP-assisted text annotation

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Anatomical entities such as kidney, muscle and blood are central to much of biomedical scientific discourse, and the detection of mentions of anatomical entities is thus necessary for the automatic analysis of the structure of domain texts. Although a number of resources and methods addressing aspects of the task have been introduced, there have so far been no annotated corpora for training and evaluating systems for broad-coverage, open-domain anatomical entity mention detection. We introduce the AnEM corpus, a domain- and species-independent resource manually annotated for anatomical entity mentions using a fine-grained classification system. The corpus texts are selected randomly from citation abstracts and full-text papers with the aim of making the corpus representative of the entire available biomedical scientific literature. We demonstrate the use of the corpus through an evaluation of the broad-coverage MetaMap tagger and a CRF-based system trained on the corpus data, considering also a combination of these two methods. The combined system demonstrates a promising level of performance, approaching 80% F-score for mention detection for a relaxed matching criterion. The corpus and other introduced resources are available under open licences from http://www.nactem.ac.uk/anatomy/.