Fast exact maximum likelihood estimation for mixture of language model

Authors:
Yi Zhang;Wei Xu
Affiliations:
School of Engineering, University of California Santa Cruz, 1156 High Street, Santa Cruz, CA, USA;NEC Lab America, 10080 North Wolfe Road, Suite SW3-350, Cupertino, CA, USA
Venue:
Information Processing and Management: an International Journal
Year:
2008

Citing 7
Cited 0

A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Novelty and redundancy detection in adaptive filtering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval and novelty detection at the sentence level

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Parsimonious language models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive semantic smoothing for the language modeling approach to genomic IR

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language modeling is an effective and theoretically attractive probabilistic framework for text information retrieval. The basic idea of this approach is to estimate a language model of a given document (or document set), and then do retrieval or classification based on this model. A common language modeling approach assumes the data D is generated from a mixture of several language models. The core problem is to find the maximum likelihood estimation of one language model mixture, given the fixed mixture weights and the other language model mixture. The EM algorithm is usually used to find the solution. In this paper, we proof that an exact maximum likelihood estimation of the unknown mixture component exists and can be calculated using the new algorithm we proposed. We further improve the algorithm and provide an efficient algorithm of O(k) complexity to find the exact solution, where k is the number of words occurring at least once in data D. Furthermore, we proof the probabilities of many words are exactly zeros, and the MLE estimation is implemented as a feature selection technique explicitly.