Algorithms for approximate string matching
Information and Control
Journal of the American Society for Information Science and Technology
Using Association Rules to Discover Search Engines Related Queries
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Analysis of the query logs of a web site search engine
Journal of the American Society for Information Science and Technology
Temporal analysis of a very large topically categorized Web query log
Journal of the American Society for Information Science and Technology
Using Google distance to weight approximate ontology matches
Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Journal of the American Society for Information Science and Technology
Evaluation of query expansion using MeSH in PubMed
Information Retrieval
Guest Editorial: Current issues in biomedical text mining and natural language processing
Journal of Biomedical Informatics
Journal of Biomedical Informatics
Supervised hypothesis discovery using syllogistic patterns in the biomedical literature
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.