Relation-Based document retrieval for biomedical literature databases

  • Authors:
  • Xiaohua Zhou;Xiaohua Hu;Xia Lin;Hyoil Han;Xiaodan Zhang

  • Affiliations:
  • College of Information Science & Technology, Drexel University, Philadelphia, PA;College of Information Science & Technology, Drexel University, Philadelphia, PA;College of Information Science & Technology, Drexel University, Philadelphia, PA;College of Information Science & Technology, Drexel University, Philadelphia, PA;College of Information Science & Technology, Drexel University, Philadelphia, PA

  • Venue:
  • DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we explore the direct use of relations in information retrieval for precision-focused biomedical literature search. A relation is defined as a pair of two concepts which are semantically and syntactically related to each other. Unlike the traditional term-based IR models, our model represents a document by a set of controlled concepts and their binary relations. Since document level co-occurrence of two concepts, in many cases, does not mean this document really addresses their relationships, the direct use of relation may improve the precision of very specific search, e.g. searching documents that mention genes regulated by Smad4. For this purpose, we develop a generic ontology-based approach to extract concepts and their relations; a prototyped IR system supporting relation-based search is then built for Medline abstract search. We then use this novel IR system to improve the retrieval result of all official runs in TREC-2004 Genomics Track. The experiment shows promising performance of relation-based IR. The mean of P@100 (the precision of top 100 documents) for all 50 topics is raised from 26.37 %( the P@100 of the best run is 42.10%) to 53.69% while the recall is kept at an acceptable level of 44.31%. The experiment also demonstrates the expressiveness of relations for the representation of genomic information needs.