A new bigram-PLSA language model for speech recognition

Authors:
Mohammad Bahrani;Hossein Sameti
Affiliations:
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran;Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Venue:
EURASIP Journal on Advances in Signal Processing
Year:
2010

Citing 5
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Learning from dyadic data

Proceedings of the 1998 conference on Advances in neural information processing systems II
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Topic modeling: beyond bag-of-words

ICML '06 Proceedings of the 23rd international conference on Machine learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel method for combining bigram model and Probabilistic Latent Semantic Analysis (PLSA) is introduced for language modeling. The motivation behind this idea is the relaxation of the "bag of words" assumption fundamentally present in latent topic models including the PLSA model. An EM-based parameter estimation technique for the proposed model is presented in this paper. Previous attempts to incorporate word order in the PLSA model are surveyed and compared with our new proposed model both in theory and by experimental evaluation. Perplexity measure is employed to compare the effectiveness of recently introduced models with the new proposed model. Furthermore, experiments are designed and carried out on continuous speech recognition (CSR) tasks using word error rate (WER) as the evaluation criterion. The superiority of the new bigram-PLSA model over Nie et al.'s bigram-PLSA and simple PLSA models is demonstrated in the results of our experiments. Experiments on BLLIP WSJ corpus show about 12% reduction in perplexity and 2.8% WER improvement compared to Nie et al.'s bigram-PLSA model.