A supervised learning approach to biological question answering

  • Authors:
  • Ryan T. K. Lin;Justin Liang-Te Chiu;Hong-Jie Dai;Richard Tzong-Han Tsai;Min-Yuh Day;Wen-Lian Hsu

  • Affiliations:
  • Institute of Information Science, Academia Sinica, Taipei, Taiwan;Institute of Information Science, Academia Sinica, Taipei, Taiwan and Department of Computer Science & Information Engineering, National Taiwan University, Taipei, Taiwan;Institute of Information Science, Academia Sinica, Taipei, Taiwan and Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan;(Correspd. Tel.: +886 3 4638800 ext. 2367 ext. 7062/ Fax: +886 3 4638850/ E-mail: thtsai@saturn.yzu.edu.tw) Department of Computer Science & Engineering, Yuan Ze University, Taoyuan, Taiwan;Institute of Information Science, Academia Sinica, Taipei, Taiwan;Institute of Information Science, Academia Sinica, Taipei, Taiwan and Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan

  • Venue:
  • Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Biologists rely on keyword-based search engines to retrieve superficially relevant papers, from which they must filter out the irrelevant information manually. Question answering (QA) systems can offer more efficient and user-friendly ways of retrieving such information. Two contributions are provided in this paper. First, a factoid QA system is developed to employ a named entity recognition module to extract answer candidates and a linear model to rank them. The linear model uses various semantic features, such as named entity types and semantic roles. To tune the weights of features used by the model, a novel supervised learning algorithm, which only needs small amounts of training data, is provided. Second, a QA system may assign several answers with the same score, making evaluation unfair. To solve this problem, an efficient formula for a mean average reciprocal rank (MARR) is proposed to reduce the complexity of its computation. After employing all effective semantic features, our system achieves a top-1 MARR of 74.11% and top-5 MARR of 76.68%. In comparison of the baseline system, the top-1 and top-5 MARR increase by 9.5% and 7.1%. In addition, the experiment result on test set shows our ranking method, which achieves 55.58% top-1 MARR and 66.99% top-5 MARR, significantly surpasses traditional BM25 and simple voting in performance by averagely 35.23% and 36.64%, respectively.