Polychannel systems for mass digital communications
Communications of the ACM
On generalizing the Two-Poisson model
Journal of the American Society for Information Science
Probabilistic models of indexing and searching
SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
“Is this document relevant?…probably”: a survey of probabilistic models in information retrieval
ACM Computing Surveys (CSUR)
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
A frequency-based and a poisson-based definition of the probability of being informative
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A parallel derivation of probabilistic information retrieval models
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Interpreting TF-IDF term weights as making relevance decisions
ACM Transactions on Information Systems (TOIS)
Part of Speech Based Term Weighting for Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A probabilistic framework for automatic term recognition
Intelligent Data Analysis
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Feature subspace selection for efficient video retrieval
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Hi-index | 0.00 |
This paper is a report of a study investigating the validity of the Multiple Poisson (nP) model of word distribution in document collections. An nP distribution is a mixture of n Poisson distributions with different means. We describe a practical algorithm for determining if a certain word is distributed acording to an nP distribution and computing the distribution parameters. The algorithm was applied to every word in four different document collections. It was found that over 70% of frequently occurring words and terms indeed behave according to the nP distributions. The results indicate that the proportion of nP words depends on the collection size, document length and the frequency of the individual words. Most of the nP words recognised are distributed according to the mixture of relatively few single Poisson distributions (two, three or four). There is an indication that the number of single Poisson components in the mixture of relatively few single Poisson distributions (two, three or four). There is an indication that the number of single Poisson components in the mixture depends on the collection frequency of words.