Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Improvement of HITS-based algorithms on web documents
Proceedings of the 11th international conference on World Wide Web
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Challenges in enterprise search
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Concept-Based Information Retrieval Using Explicit Semantic Analysis
ACM Transactions on Information Systems (TOIS)
Nonlinear transformation of term frequencies for term weighting in text categorization
Engineering Applications of Artificial Intelligence
Hi-index | 0.01 |
In the TREC collection -a large full-text experimental text collection with widely varying document lengths -we observe that the likelihood of a document being judged relevant by a user increases with the document length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with roughly equal probability, will not optimally retrieve useful documents from such a collection. We present a modified technique that attempts to match the likelihood of retrieving a document of a certain length to the likelihood of documents of that length being judged relevant, and show that this technique yields significant improvements in retrieval effectiveness.