SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On modeling information retrieval with probabilistic inference
ACM Transactions on Information Systems (TOIS)
Term Frequency Normalization via Pareto Distributions
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
An information-theoretic perspective of tf—idf measures
Information Processing and Management: an International Journal
Dempster-Shafer Theory for a Query-Biased Combination of Evidence on the Web
Information Retrieval
Relevance information: a loss of entropy but a gain for IDF?
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Interpreting TF-IDF term weights as making relevance decisions
ACM Transactions on Information Systems (TOIS)
TF-IDF uncovered: a study of theories and probabilities
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generalized inverse document frequency
Proceedings of the 17th ACM conference on Information and knowledge management
Terminological cleansing for improved information retrieval based on ontological terms
Proceedings of the WSDM '09 Workshop on Exploiting Semantic Annotations in Information Retrieval
Information Sciences: an International Journal
Efficient food retrieval techniques considering relative frequencies of food related words
ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
A schema-driven approach for knowledge-oriented retrieval and query formulation
KEYS '12 Proceedings of the Third International Workshop on Keyword Search on Structured Data
IR models: foundations and relationships
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
On the modelling of ranking algorithms in probabilistic datalog
Proceedings of the 7th International Workshop on Ranking in Databases
Hi-index | 0.00 |
This paper reports on theoretical investigations about the assumptions underlying the inverse document frequency (idf). We show that an intuitive idf-based probability function for the probability of a term being informative assumes disjoint document events. By assuming documents to be independent rather than disjoint, we arrive at a Poisson-based probability of being informative. The framework is useful for understanding and deciding the parameter estimation and combination in probabilistic retrieval models.