Adapting pivoted document-length normalization for query size: Experiments in Chinese and English

Authors:
Tze Leung Chung;Robert Wing Pong Luk;Kam Fai Wong;Kui Lam Kwok;Dik Lun Lee
Affiliations:
The Hong Kong Polytechnic University;The Hong Kong Polytechnic University;The Chinese University of Hong Kong;Queens College, City University of New York;The Hong Kong University of Science and Technology
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2006

Citing 13
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A network approach to probabilistic information retrieval

ACM Transactions on Information Systems (TOIS)
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Phrasal translation and query expansion techniques for cross-language information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
On the use of words and n-grams for Chinese information retrieval

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
The effect of topic set size on retrieval experiment error

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document normalization revisited

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document Ranking and the Vector-Space Model

IEEE Software
NTCIR Workshop: Japanese- and Chinese-English Cross-Lingual Information Retrieval and Multi-grade Relevance Judgments

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
A comparison of Chinese document indexing strategies and retrieval models

ACM Transactions on Asian Language Information Processing (TALIP)
An empirical study on retrieval models for different document genres: patents and newspaper articles

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Comparative study of monolingual and multilingual search models for use with asian languages

ACM Transactions on Asian Language Information Processing (TALIP)

Re-examining the effects of adding relevance information in a relevance feedback environment

Information Processing and Management: an International Journal
Thematic Segment Retrieval Revisited

AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
The effect of query length on normalisation in information retrieval

AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
A constraint to automatically regulate document-length normalisation

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The vector space model (VSM) is one of the most widely used information retrieval (IR) models in both academia and industry. It was less effective at the Chinese ad hoc retrieval tasks than other retrieval models in the NTCIR-3 evaluation workshop, but comparable to those in the NTCIR-4 and NTCIR-5 workshops. We do not know whether the lower level performance was due to the VSM's inherent deficiencies or to a less effective normalization of document length. Hence we evaluated the VSM with various pivoted normalizations of document length using the NTCIR-3 collection for confirmation. We found that VSM's retrieval effectiveness with pivoted normalization was comparable to other competitive retrieval models (for example, 2-Poisson), and that VSM's retrieval speed with pivoted normalization was similar to competitive retrieval models (2-Poisson). We proposed a novel adaptive scheme that automatically estimates the (near) best parameters for pivoted document-length normalization based on query size; the new normalization is called adaptive pivoted document-length normalization. This scheme achieved good retrieval effectiveness, sometimes for short (title) queries and sometimes for long queries, without manually adjusting parameter values. We found that unique, adaptive pivoted normalization can enhance fixed pivoted normalizations for different test collections (TREC-5 and TREC-6). We also evaluated the VSM with the adaptive pivoted normalization using the pseudo-relevance feedback (PRF) and found that this type of VSM performs similarly to the competitive retrieval models (2-Poisson) with PRF. Hence, we conclude that the VSM with unique (adaptive) pivoted document-length normalization is effective for Chinese IR and that its retrieval effectiveness is comparable to that of other competitive retrieval models with or without PRF for the reference test collections used in this evaluation.