Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A network approach to probabilistic information retrieval
ACM Transactions on Information Systems (TOIS)
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Phrasal translation and query expansion techniques for cross-language information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
On the use of words and n-grams for Chinese information retrieval
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
The effect of topic set size on retrieval experiment error
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document normalization revisited
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document Ranking and the Vector-Space Model
IEEE Software
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
A comparison of Chinese document indexing strategies and retrieval models
ACM Transactions on Asian Language Information Processing (TALIP)
An empirical study on retrieval models for different document genres: patents and newspaper articles
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Comparative study of monolingual and multilingual search models for use with asian languages
ACM Transactions on Asian Language Information Processing (TALIP)
Re-examining the effects of adding relevance information in a relevance feedback environment
Information Processing and Management: an International Journal
Thematic Segment Retrieval Revisited
AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
The effect of query length on normalisation in information retrieval
AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
A constraint to automatically regulate document-length normalisation
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The vector space model (VSM) is one of the most widely used information retrieval (IR) models in both academia and industry. It was less effective at the Chinese ad hoc retrieval tasks than other retrieval models in the NTCIR-3 evaluation workshop, but comparable to those in the NTCIR-4 and NTCIR-5 workshops. We do not know whether the lower level performance was due to the VSM's inherent deficiencies or to a less effective normalization of document length. Hence we evaluated the VSM with various pivoted normalizations of document length using the NTCIR-3 collection for confirmation. We found that VSM's retrieval effectiveness with pivoted normalization was comparable to other competitive retrieval models (for example, 2-Poisson), and that VSM's retrieval speed with pivoted normalization was similar to competitive retrieval models (2-Poisson). We proposed a novel adaptive scheme that automatically estimates the (near) best parameters for pivoted document-length normalization based on query size; the new normalization is called adaptive pivoted document-length normalization. This scheme achieved good retrieval effectiveness, sometimes for short (title) queries and sometimes for long queries, without manually adjusting parameter values. We found that unique, adaptive pivoted normalization can enhance fixed pivoted normalizations for different test collections (TREC-5 and TREC-6). We also evaluated the VSM with the adaptive pivoted normalization using the pseudo-relevance feedback (PRF) and found that this type of VSM performs similarly to the competitive retrieval models (2-Poisson) with PRF. Hence, we conclude that the VSM with unique (adaptive) pivoted document-length normalization is effective for Chinese IR and that its retrieval effectiveness is comparable to that of other competitive retrieval models with or without PRF for the reference test collections used in this evaluation.