Text categorization for multi-page documents: a hybrid naive Bayes HMM approach
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Hidden Markov Models for Text Categorization in Multi-Page Documents
Journal of Intelligent Information Systems
ACIRD: Intelligent Internet Document Organization and Retrieval
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Dominant meanings classification model for web information
Design and application of hybrid intelligent systems
A risk minimization framework for information retrieval
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Using hypothesis margin to boost centroid text classifier
Proceedings of the 2007 ACM symposium on Applied computing
UVA: language modeling techniques for web people search
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
A novel neighborhood based document smoothing model for information retrieval
Information Retrieval
Hi-index | 0.00 |
This paper introduces the multinomial model of text classification and retrieval. One important feature of the model is that the tf statistic, which usually appears in probabilistic IR models as a heuristic, is an integral part of the model. Another is that the variable length of documents is accounted for, without either making a uniform length assumption or using length normalization. The multinomial model employs independence assumptions which are similar to assumptions made in previous probabilistic models, particularly the binary independence model and the 2-Poisson model. The use of simulation to study the model is described. Performance of the model is evaluated on the TREC-3 routing task. Results are compared with the binary independence model and with the simulation studies.