Improving biomedical document retrieval using domain knowledge

  • Authors:
  • Shuguang Wang;Milos Hauskrecht

  • Affiliations:
  • University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA

  • Venue:
  • Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Research articles typically introduce new results or findings and relate them to knowledge entities of immediate relevance. However, a large body of context knowledge related to the results is often not explicitly mentioned in the article. To overcome this limitation the state-of-the-art information retrieval approaches rely on the latent semantic analysis in which terms in articles are projected to a lower dimensional latent space and best possible matches in this space are identified. However, this approach may not perform well enough if the number of explicit knowledge entities in the articles is too small compared to the amount of knowledge in the domain. We address the problem by exploiting a domain knowledge layer, a rich network of relations among knowledge entities in the domain extracted from a large corpus of documents. The knowledge layer supplies the context knowledge that lets us relate different knowledge entities and hence improve the information retrieval performance. We develop and study a new framework for i) learning and aggregating the relations in the knowledge layer from the literature corpus; ii) and for exploiting these relations to improve the information-retrieval of relevant documents.