A semi-automatic method for annotating a biomedical proposition bank

  • Authors:
  • Wen-Chi Chou;Richard Tzong-Han Tsai;Ying-Shan Su;Wei Ku;Ting-Yi Sung;Wen-Lian Hsu

  • Affiliations:
  • Academia Sinica, Taiwan, ROC;Academia Sinica, Taiwan, ROC and National Taiwan University, Taiwan, ROC;Academia Sinica, Taiwan, ROC;Academia Sinica, Taiwan, ROC and National Taiwan University, Taiwan, ROC;Academia Sinica, Taiwan, ROC;Academia Sinica, Taiwan, ROC

  • Venue:
  • LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a semiautomatic approach for annotating semantic information in biomedical texts. The information is used to construct a biomedical proposition bank called BioProp. Like PropBank in the newswire domain, BioProp contains annotations of predicate argument structures and semantic roles in a treebank schema. To construct BioProp, a semantic role labeling (SRL) system trained on PropBank is used to annotate BioProp. Incorrect tagging results are then corrected by human annotators. To suit the needs in the biomedical domain, we modify the PropBank annotation guidelines and characterize semantic roles as components of biological events. The method can substantially reduce annotation efforts, and we introduce a measure of an upper bound for the saving of annotation efforts. Thus far, the method has been applied experimentally to a 4,389-sentence tree-bank corpus for the construction of BioProp. Inter-annotator agreement measured by kappa statistic reaches .95 for combined decision of role identification and classification when all argument labels are considered. In addition, we show that, when trained on BioProp, our biomedical SRL system called BIOSMILE achieves an F-score of 87%.