MedTag: a collection of biomedical annotations

  • Authors:
  • L. H. Smith;L. Tanabe;T. Rindflesch;W. J. Wilbur

  • Affiliations:
  • National Center for Biotechnology Information;National Center for Biotechnology Information;Lister Hill National Center for Biomedical Communications, Bethesda, MD;National Center for Biotechnology Information

  • Venue:
  • ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a database of annotated biomedical text corpora merged into a portable data structure with uniform conventions. MedTag combines three corpora, MedPost, ABGene and GENETAG, within a common relational database data model. The GENETAG corpus has been modified to reflect new definitions of genes and proteins. The MedPost corpus has been updated to include 1,000 additional sentences from the clinical medicine domain. All data have been updated with original MEDLINE text excerpts, PubMed identifiers, and tokenization independence to facilitate data accuracy, consistency and usability. The data are available in flat files along with software to facilitate loading the data into a relational SQL database from ftp://ftp.ncbi.nlm.nih.gov/pub/lsmith/MedTag/medtag.tar.gz.