Open-domain anatomical entity mention detection

  • Authors:
  • Tomoko Ohta;Sampo Pyysalo;Jun'ichi Tsujii;Sophia Ananiadou

  • Affiliations:
  • National Centre for Text Mining and University of Manchester, Manchester, UK;National Centre for Text Mining and University of Manchester, Manchester, UK;Microsoft Research Asia, Beijing, China;National Centre for Text Mining and University of Manchester, Manchester, UK

  • Venue:
  • ACL '12 Proceedings of the Workshop on Detecting Structure in Scholarly Discourse
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Anatomical entities such as kidney, muscle and blood are central to much of biomedical scientific discourse, and the detection of mentions of anatomical entities is thus necessary for the automatic analysis of the structure of domain texts. Although a number of resources and methods addressing aspects of the task have been introduced, there have so far been no annotated corpora for training and evaluating systems for broad-coverage, open-domain anatomical entity mention detection. We introduce the AnEM corpus, a domain- and species-independent resource manually annotated for anatomical entity mentions using a fine-grained classification system. The corpus texts are selected randomly from citation abstracts and full-text papers with the aim of making the corpus representative of the entire available biomedical scientific literature. We demonstrate the use of the corpus through an evaluation of the broad-coverage MetaMap tagger and a CRF-based system trained on the corpus data, considering also a combination of these two methods. The combined system demonstrates a promising level of performance, approaching 80% F-score for mention detection for a relaxed matching criterion. The corpus and other introduced resources are available under open licences from http://www.nactem.ac.uk/anatomy/.