Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
The Journal of Machine Learning Research
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
A generic ranking function discovery framework by genetic programming for information retrieval
Information Processing and Management: an International Journal
Simplified similarity scoring using term ranks
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Linear discriminant model for information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of axiomatic approaches to information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
Optimisation methods for ranking functions with multiple parameters
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A statistical view of binned retrieval models
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Text retrieval methods for item ranking in collaborative filtering
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Adaptive term weighting through stochastic optimization
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Bridging memory-based collaborative filtering and text retrieval
Information Retrieval
Hi-index | 0.00 |
Most of the previous research on term weighting for information retrieval has focused on developing specialized parametric term weighting functions. Examples include TF .IDF vector-space formulations, BM25, and language modeling weighting. Each of these term weighting functions takes on a specific parametric form. While these weighting functions have proven to be highly effective, they impose strict constraints on the functional form of the term weights. Such constraints may possibly degrade retrieval effectiveness. In this paper we propose two new classes of term weighting schemes that we call semi-parametric and non-parametric weighting. These weighting schemes make fewer assumptions about the underlying term weights and allow the data to speak for itself. We argue that these robust weighting schemes have the potential to be significantly more effective compared to existing parametric schemes, especially with the growing amount of training data becoming available.