An information retrieval model based on vector space method by supervised learning

  • Authors:
  • Xiaoying Tai;Fuji Ren;Kenji Kita

  • Affiliations:
  • Faculty of Engineering, Tokushima University, 2-1, Minaml-josanijima, Tokushima 770-8506, Japan;Faculty of Engineering, Tokushima University, 2-1, Minaml-josanijima, Tokushima 770-8506, Japan;Faculty of Engineering, Tokushima University, 2-1, Minaml-josanijima, Tokushima 770-8506, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a method to improve retrieval performance of the vector space model (VSM) in part by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, information such as original document similarities is incorporated into the retrieval model, which is built by using a sequence of linear transformations. High-dimensional and sparse vectors are then reduced by singular value decomposition (SVD) and transformed into a low-dimensional vector space, namely the space representing the latent semantic meanings of words. The method has been tested with two test collections, the Medline collection and the Cranfield collection. In order to train the model, multiple partitions are created for each collection. Improvement of average precision of the averages over all partitions, compared with the latent semantic indexing (LSI) model, are 20.57% (Medline) and 22.23% (Cranfield) for the two training data sets, and 0.47% (Medline) and 4.78% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve user-supplied relevance information for the long term in the system in order to use it later.