An information retrieval model based on vector space method by supervised learning

Authors:
Xiaoying Tai;Fuji Ren;Kenji Kita
Affiliations:
Faculty of Engineering, Tokushima University, 2-1, Minaml-josanijima, Tokushima 770-8506, Japan;Faculty of Engineering, Tokushima University, 2-1, Minaml-josanijima, Tokushima 770-8506, Japan;Faculty of Engineering, Tokushima University, 2-1, Minaml-josanijima, Tokushima 770-8506, Japan
Venue:
Information Processing and Management: an International Journal
Year:
2002

Citing 12
Cited 7

Numerical recipes in C: the art of scientific computing

Numerical recipes in C: the art of scientific computing
Using latent semantic analysis to improve access to textual information

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Using linear algebra for intelligent information retrieval

SIAM Review
A survey of information retrieval and filtering methods

A survey of information retrieval and filtering methods
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Learning human-like knowledge by singular value decomposition: a progress report

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Understanding search engines: mathematical modeling and text retrieval

Understanding search engines: mathematical modeling and text retrieval
A similarity-based probability model for latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
User lenses—achieving 100% precision on frequently asked questions

UM '99 Proceedings of the seventh international conference on User modeling
A vector space model for automatic indexing

Communications of the ACM
Pattern Recognition with Neural Network in C++

Pattern Recognition with Neural Network in C++

A similarity-based method for retrieving documents from the SCI/SSCI database

Journal of Information Science
Using position, fonts and cited references to retrieve scientific documents

Journal of Information Science
A document classification and retrieval system for R&D in semiconductor industry - A hybrid approach

Expert Systems with Applications: An International Journal
Knowledge acquisition method from domain text based on theme logic model and artificial neural network

Expert Systems with Applications: An International Journal
A semantic similarity approach to predicting Library of Congress subject headings for social tags

Journal of the American Society for Information Science and Technology
GA based optimal keyword extraction in an automatic chinese web document classification system

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
A Heuristic Method for Learning Path Sequencing for Intelligent Tutoring System ITS in E-learning

International Journal of Intelligent Information Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a method to improve retrieval performance of the vector space model (VSM) in part by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, information such as original document similarities is incorporated into the retrieval model, which is built by using a sequence of linear transformations. High-dimensional and sparse vectors are then reduced by singular value decomposition (SVD) and transformed into a low-dimensional vector space, namely the space representing the latent semantic meanings of words. The method has been tested with two test collections, the Medline collection and the Cranfield collection. In order to train the model, multiple partitions are created for each collection. Improvement of average precision of the averages over all partitions, compared with the latent semantic indexing (LSI) model, are 20.57% (Medline) and 22.23% (Cranfield) for the two training data sets, and 0.47% (Medline) and 4.78% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve user-supplied relevance information for the long term in the system in order to use it later.