Improving accuracy for identifying related PubMed queries by an integrated approach

Authors:
Zhiyong Lu;W. John Wilbur
Affiliations:
National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA;National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA
Venue:
Journal of Biomedical Informatics
Year:
2009

Citing 9
Cited 3

Algorithms for approximate string matching

Information and Control
Relevant term suggestion in interactive web search based on contextual information in query session logs

Journal of the American Society for Information Science and Technology
Using Association Rules to Discover Search Engines Related Queries

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Analysis of the query logs of a web site search engine

Journal of the American Society for Information Science and Technology
Temporal analysis of a very large topically categorized Web query log

Journal of the American Society for Information Science and Technology
Using Google distance to weight approximate ontology matches

Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Mining related queries from Web search engine query logs using an improved association rule mining model

Journal of the American Society for Information Science and Technology
Evaluation of query expansion using MeSH in PubMed

Information Retrieval

Guest Editorial: Current issues in biomedical text mining and natural language processing

Journal of Biomedical Informatics
Supporting effective health and biomedical information retrieval and navigation: A novel facet view interface evaluation

Journal of Biomedical Informatics
Supervised hypothesis discovery using syllogistic patterns in the biomedical literature

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.