Boosting Biomedical Information Retrieval Performance through Citation Graph: An Empirical Study

  • Authors:
  • Xiaoshi Yin;Xiangji Huang;Qinmin Hu;Zhoujun Li

  • Affiliations:
  • School of Information Technology, York University, Toronto, Ontario, Canada M3J 1P3 and School of Computer Science and Engineering, Beihang University, Beijing, China 100083;School of Information Technology, York University, Toronto, Ontario, Canada M3J 1P3;Computer Science Department, York University, Toronto, Ontario, Canada M3J 1P3;School of Computer Science and Engineering, Beihang University, Beijing, China 100083

  • Venue:
  • PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an empirical study of the combination of content-based information retrieval results with linkage-based document importance scores to improve retrieval performance on TREC biomedical literature datasets. In our study, content-based information comes from the state-of-the-art probability model based Okapi information retrieval system. On the other hand, linkage-based information comes from a citation graph generated from REFERENCES sections of a biomedical literature dataset. Three well-known linkage-based ranking algorithms (PageRank, HITS and InDegree) are applied on the citation graph to calculate document importance scores. We use TREC 2007 Genomics dataset for evaluation, which contains 162,259 biomedical literatures. Our approach achieves the best document-based MAP among all results that have been reported so far. Our major findings can be summarized as follows. First, without hyperlinks, linkage information extracted from REFERENCES sections can be used to improve the effectiveness of biomedical information retrieval. Second, performance of the integrated system is sensitive to linkage-based ranking algorithms, and a simpler algorithm, InDegree, is more suitable for biomedical literature retrieval.