Corpus construction for extracting disease-gene relations

  • Authors:
  • Hong-Woo Chun;Sa-Kwang Song;Sung-Pil Choi;Hanmin Jung

  • Affiliations:
  • Korea Institute of Science and Technology Information (KISTI), Daejeon, South Korea;Korea Institute of Science and Technology Information (KISTI), Daejeon, South Korea;Korea Institute of Science and Technology Information (KISTI), Daejeon, South Korea;Korea Institute of Science and Technology Information (KISTI), Daejeon, South Korea

  • Venue:
  • ISMIS'12 Proceedings of the 20th international conference on Foundations of Intelligent Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many corpus-based statistical methods have been used to tackle issues of extracting disease-gene relations (DGRs) from literature. There are two limitations in the corpus-based approach: One is that available corpora for training a system are not enough and the other is that previous most research have not deal with various types of DGRs but a binary relation. In other words, analysis of presence of relation itself has been a common issue. However, the binary relation is not enough to explain DGR in practice. One solution is to construct a corpus that can analyze various types of relations between diseases and their related genes. This article describes a corpus construction process with respect to the DGRs. Eleven topics of relations were defined by biologists. Four annotators participated in the corpus annotation task and their inter-annotator agreement was calculated to show reliability for the annotation results. The gold standard data in the proposed approach can be used to enhance the performance of many research. Examples include recognition of gene and disease names and extraction of fine-grained DGRs. The corpus will be released through the GENIA project home page.