Support vector machines: relevance feedback and information retrieval

  • Authors:
  • Harris Drucker;Behzad Shahrary;David C. Gibbon

  • Affiliations:
  • AT&T Research and Monmouth University, West Long Branch, NJ;AT&T Research, 200 Laurel Ave., Middletown, NJ;AT&T Research, 200 Laurel Ave., Middletown, NJ

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We compare support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. It is assumed a preliminary search finds a set of documents that the user marks as relevant or not and then feedback iterations commence. Particular attention is paid to IR searches where the number of relevant documents in the database is low and the preliminary set of documents used to start the search has few relevant documents. Experiments show that if inverse document frequency (IDF) weighting is not used because one is unwilling to pay the time penalty needed to obtain these features, then SVMs are better whether using term-frequency (TF) or binary weighting. SVM performance is marginally better than Ide dec-hi if TF-IDF weighting is used and there is a reasonable number of relevant documents found in the preliminary search. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred.