Identifying training sets for personalized article retrieval system

  • Authors:
  • Cen Li;Sachintha Pitigala;Suk J. Seo

  • Affiliations:
  • Middle Tennessee State University, Murfreesboro, TN;Middle Tennessee State University, Murfreesboro, TN;Middle Tennessee State University, Murfreesboro, TN

  • Venue:
  • Proceedings of the 49th Annual Southeast Regional Conference
  • Year:
  • 2011

Quantified Score

Hi-index 0.02

Visualization

Abstract

Retrieving documents that are relevant to a particular researcher's purpose is a big challenge, especially when searching through large database, such as PubMed. Researchers who use traditional keyword-based document retrieval systems often end up with a large collection of documents that are not directly relevant to their needs. What is needed is a personalized document retrieval system that can select only relevant articles for one's specific research interests. Obtaining an appropriate training data set is essential in building and testing personalized article retrieval systems. This study describes one approach to form such training data set based on articles categorized by domain experts under MeSH major topics. Text classifiers, learned using Support Vector Machines, were used to test to what degree the training set categories are differentiable. Preliminary results and analysis of the results are discussed.