Boosting Biomedical Information Retrieval Performance through Citation Graph: An Empirical Study

Authors:
Xiaoshi Yin;Xiangji Huang;Qinmin Hu;Zhoujun Li
Affiliations:
School of Information Technology, York University, Toronto, Ontario, Canada M3J 1P3 and School of Computer Science and Engineering, Beihang University, Beijing, China 100083;School of Information Technology, York University, Toronto, Ontario, Canada M3J 1P3;Computer Science Department, York University, Toronto, Ontario, Canada M3J 1P3;School of Computer Science and Engineering, Beihang University, Beijing, China 100083
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 8
Cited 4

The quest for correct information on the Web: hyper search engines

Selected papers from the sixth international conference on World Wide Web
WebQuery: searching and visualizing the Web through connectivity

Selected papers from the sixth international conference on World Wide Web
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Applying Machine Learning to Text Segmentation for Information Retrieval

Information Retrieval
Characterizing and Mining the Citation Graph of the Computer Science Literature

Knowledge and Information Systems
Link analysis ranking: algorithms, theory, and experiments

ACM Transactions on Internet Technology (TOIT)
Hits on the web: how does it compare?

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Mining and modeling linkage information from citation context for improving biomedical literature retrieval

Information Processing and Management: an International Journal
Firework visualization: a model for local citation analysis

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Re-ranking with context for high-performance biomedical information retrieval

International Journal of Data Mining and Bioinformatics
Using semantic-based association rule mining for improving clinical text retrieval

HIS'13 Proceedings of the second international conference on Health Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an empirical study of the combination of content-based information retrieval results with linkage-based document importance scores to improve retrieval performance on TREC biomedical literature datasets. In our study, content-based information comes from the state-of-the-art probability model based Okapi information retrieval system. On the other hand, linkage-based information comes from a citation graph generated from REFERENCES sections of a biomedical literature dataset. Three well-known linkage-based ranking algorithms (PageRank, HITS and InDegree) are applied on the citation graph to calculate document importance scores. We use TREC 2007 Genomics dataset for evaluation, which contains 162,259 biomedical literatures. Our approach achieves the best document-based MAP among all results that have been reported so far. Our major findings can be summarized as follows. First, without hyperlinks, linkage information extracted from REFERENCES sections can be used to improve the effectiveness of biomedical information retrieval. Second, performance of the integrated system is sensitive to linkage-based ranking algorithms, and a simpler algorithm, InDegree, is more suitable for biomedical literature retrieval.