Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On modeling information retrieval with probabilistic inference
ACM Transactions on Information Systems (TOIS)
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
An information-theoretic perspective of tf—idf measures
Information Processing and Management: an International Journal
Bayesian extension to the language model for ad hoc information retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A frequency-based and a poisson-based definition of the probability of being informative
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval)
Relevance information: a loss of entropy but a gain for IDF?
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A parallel derivation of probabilistic information retrieval models
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A study of Poisson query generation model for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
International Journal of Advanced Intelligence Paradigms
Entropy-biased models for query representation on the click graph
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
MatchSim: a novel neighbor-based similarity measure with maximum neighborhood matching
Proceedings of the 18th ACM conference on Information and knowledge management
An E-collaborative learning environment based on dynamic workflow system
ITHET'10 Proceedings of the 9th international conference on Information technology based higher education and training
Efficient diversity-aware search
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Towards a better understanding of the relationship between probabilistic models in IR
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Lightweight integration of IR and DB for scalable hybrid search with integrated ranking support
Web Semantics: Science, Services and Agents on the World Wide Web
Vocabulary filtering for term weighting in archived question search
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
IR models: foundations and relationships
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Mining interests for user profiling in electronic conversations
Expert Systems with Applications: An International Journal
Predicting relevant documents for enterprise communication contexts
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Bridging memory-based collaborative filtering and text retrieval
Information Retrieval
Hi-index | 0.02 |
Interpretations of TF-IDF are based on binary independence retrieval, Poisson, information theory, and language modelling. This paper contributes a review of existing interpretations, and then, TF-IDF is systematically related to the probabilities P(q|d) and P(d|q). Two approaches are explored: a space of independent, and a space of disjoint terms. For independent terms, an "extreme" query/non-query term assumption uncovers TF-IDF, and an analogy of P(d|q) and the probabilistic odds O(r|d, q) mirrors relevance feedback. For disjoint terms, a relationship between probability theory and TF-IDF is established through the integral + 1/x dx = log x. This study uncovers components such as divergence from randomness and pivoted document length to be inherent parts of a document-query independence (DQI) measure, and interestingly, an integral of the DQI over the term occurrence probability leads to TF-IDF.