The GENIA corpus: an annotated research abstract corpus in molecular biology domain

  • Authors:
  • Tomoko Ohta;Yuka Tateisi;Jin-Dong Kim

  • Affiliations:
  • University of Tokyo, Bunkyo-ku, Tokyo, Japan;CREST, JST, Bunkyo-ku, Tokyo, Japan;CREST, JST, Bunkyo-ku, Tokyo, Japan

  • Venue:
  • HLT '02 Proceedings of the second international conference on Human Language Technology Research
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

With the information overload in genome-related field, there is an increasing need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are developing the necessary resources including domain ontology and annotated corpus from research abstracts in MEDLINE database (GENIA corpus). We are building the ontology and the corpus simultaneously, using each other. In this paper we report on our new corpus, its ontological basis, annotation scheme, and statistics of annotated objects. We also describe the tools used for corpus annotation and management.