Features combination for extracting gene functions from MEDLINE

  • Authors:
  • Patrick Ruch;Laura Perret;Jacques Savoy

  • Affiliations:
  • University Hospital of Geneva;University of Neuchâtel;University of Neuchâtel

  • Venue:
  • ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes and evaluates a summarization system that extracts the gene function textual descriptions (called GeneRIF) based on a MedLine record. Inputs for this task include both a locus (a gene in the LocusLink database), and a pointer to a MedLine record supporting the GeneRIF. In the suggested approach we merge two independent phrase extraction strategies. The first proposed strategy (LASt) uses argumentative, positional and structural features in order to suggest a GeneRIF. The second extraction scheme (LogReg) incorporates statistical properties to select the most appropriate sentence as the GeneRIF. Based on the TREC-2003 genomic collection, the basic extraction strategies are already competitive (52.78% for LASt and 52.28% for LogReg, respectively). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 55%.