Modeling actions of PubMed users with n-gram language models

  • Authors:
  • Jimmy Lin;W. John Wilbur

  • Affiliations:
  • The iSchool, College of Information Studies, University of Maryland, College Park, USA and National Center for Biotechnology Information, National Library of Medicine, Bethesda, USA;National Center for Biotechnology Information, National Library of Medicine, Bethesda, USA

  • Venue:
  • Information Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then be applied to improve retrieval systems. This article presents a study of logs from PubMed®, the public gateway to the MEDLINE® database of bibliographic records from the medical and biomedical primary literature. Unlike most previous studies on general Web search, our work examines user activities with a highly-specialized search engine. We encode user actions as string sequences and model these sequences using n-gram language models. The models are evaluated in terms of perplexity and in a sequence prediction task. They help us better understand how PubMed users search for information and provide an enabler for improving users' search experience.