Desiderata for ontologies to be used in semantic annotation of biomedical documents

  • Authors:
  • Michael Bada;Lawrence Hunter

  • Affiliations:
  • Department of Pharmacology, University of Colorado Denver, MS 8303, RC-1 South, 12801 East 17th Avenue, L18-6400, P.O. Box 6511, Aurora, CO 80045, USA;Department of Pharmacology, University of Colorado Denver, MS 8303, RC-1 South, 12801 East 17th Avenue, L18-6400, P.O. Box 6511, Aurora, CO 80045, USA

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A wealth of knowledge valuable to the translational research scientist is contained within the vast biomedical literature, but this knowledge is typically in the form of natural language. Sophisticated natural-language-processing systems are needed to translate text into unambiguous formal representations grounded in high-quality consensus ontologies, and these systems in turn rely on gold-standard corpora of annotated documents for training and testing. To this end, we are constructing the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-text biomedical journal articles that are being manually annotated with the entire sets of terms from select vocabularies, predominantly from the Open Biomedical Ontologies (OBO) library. Our efforts in building this corpus has illuminated infelicities of these ontologies with respect to the semantic annotation of biomedical documents, and we propose desiderata whose implementation could substantially improve their utility in this task; these include the integration of overlapping terms across OBOs, the resolution of OBO-specific ambiguities, the integration of the BFO with the OBOs and the use of mid-level ontologies, the inclusion of noncanonical instances, and the expansion of relations and realizable entities.