Heavy-tailed probability distributions in the World Wide Web
A practical guide to heavy tails
An information-theoretic approach to automatic query expansion
ACM Transactions on Information Systems (TOIS)
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
Uncertainty and term selection in text categorization
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
A frequency-based and a poisson-based definition of the probability of being informative
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A study of parameter tuning for term frequency normalization
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A general matrix framework for modelling information retrieval
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
A parallel derivation of probabilistic information retrieval models
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic framework for automatic term recognition
Intelligent Data Analysis
A general matrix framework for modelling Information Retrieval
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Hi-index | 0.00 |
We exploit the Feller-Pareto characterization of the classical Pareto distribution to derive a law relating the probability of a given term frequency in a document and its the length. A similar law was derived by Mandelbrot. We exploit the paretian distribution to obtain a term frequency normalization to substitute for the actual term frequency in the probabilistic models of Information Retrieval recently introduced in TREC-10. Preliminary results show that the unique parameter of the framework can be eliminated in favour of the the term frequency normalization derived by the Paretian law.