Improvement of vector space information retrieval model based on supervised learning

Authors:
Xiaoying Tai;Minoru Sasaki;Yasuhito Tanaka;Kenji Kita
Affiliations:
Faculty of Engineering, Tokushima University, 2-1, Minami-josanjima, Tokushima 770-8506, Japan;Faculty of Engineering, Tokushima University, 2-1, Minami-josanjima, Tokushima 770-8506, Japan;Department of Economics & Information Science, Hyogo University, 2301 Shinzaike Hiraoka-cho Kakogawa, Hyogo 675-01, Japan;Faculty of Engineering, Tokushima University, 2-1, Minami-josanjima, Tokushima 770-8506, Japan
Venue:
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Year:
2000

Citing 3
Cited 1

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Using linear algebra for intelligent information retrieval

SIAM Review
User lenses—achieving 100% precision on frequently asked questions

UM '99 Proceedings of the seventh international conference on User modeling

A unified maximum likelihood approach to document retrieval

Journal of the American Society for Information Science and Technology - Visual based retrieval systems and web mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes and method to improve retrieval performance of the vector space model (VSM) by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, incorporated into the retrieval model, which is built by using a sequence of linear transformations, is information such as inter-document similarity values. Then, the high-dimensional and sparse vectors are reduced by SVD (Singular Value Decomposition) and transformed into the low-dimensional vector space, namely the space representing the latent semantic meanings of the words. The method was experimented on through two test collections, Medline collection and Cranfield collection. Improvement of average precision compared with LSI (Latent Semantic Indexing) model were 4.03% (Medline) and 24.87% (Cranfield) for the two training data sets, and 0.01% (Medline) and 4.89% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve the user-supplied relevance information for a long term in the system and to use the information later.