Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
A study of the dirichlet priors for term frequency normalisation
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Generalized inverse document frequency
Proceedings of the 17th ACM conference on Information and knowledge management
A machine learning approach for improved BM25 retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Adaptive term frequency normalization for BM25
Proceedings of the 20th ACM international conference on Information and knowledge management
A log-logistic model-based interpretation of TF normalization of BM25
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A novel TF-IDF weighting scheme for effective ranking
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
About learning models with multiple query-dependent features
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
The setting of the term frequency normalization hyper-parameter suffers from the query dependence and collection dependence problems, which remarkably hurt the robustness of the retrieval performance. Our study in this article investigates three term frequency normalization methods, namely normalization 2, BM25's normalization and the Dirichlet Priors normalization. We tackle the query dependence problem by modifying the query term weight using a Divergence From Randomness term weighting model, and tackle the collection dependence problem by measuring the correlation of the normalized term frequency with the document length. Our research hypotheses for the two problems, as well as an automatic hyper-parameter setting methodology, are extensively validated and evaluated on four Text REtrieval Conference (TREC) collections.