Building an annotated corpus in the molecular-biology domain

  • Authors:
  • Yuka Tateisi;Tomoko Ohta;Nigel Collier;Chikashi Nobata;Jun-ichi Tsujii

  • Affiliations:
  • University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan;University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan;University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan;University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan;University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan

  • Venue:
  • Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Corpus annotation is now a key topic for all areas of natural language processing (NLP) and information extraction (IE) which employ supervised learning. With the explosion of results in molecular-biology there is an increased need for IE to extract knowledge to support database building and to search intelligently for information in online journal collections. To support this we are building a corpus of annotated abstracts taken from National Library of Medicine's MEDLINE database. In this paper we report on this new corpus, its ontological basis, and our experience in designing the annotation scheme. Experimental results are shown for inter-annotator agreement and comments are made on methodological considerations.