Open-domain voice-activated question answering
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition
Computer Speech and Language
Hi-index | 0.00 |
Stochastic language models are widely used in continuous speech recognition systems where a priori probabilities of word sequences are needed. These probabilities are usually given by n-gram word models, estimated on very large training texts. When n increases, it becomes harder to find reliable statistics, even with huge texts. Grouping words is a way to overcome this problem. We have developed an automatic language independent classification procedure, which is able to optimize the classification of tens of millions of untagged words in less than a few hours on a Unix workstation. With this language independent approach, three corpora each containing about 30 million words of newspaper texts, in French, German and English, have been mapped into different numbers of classes. From these classifications, bi-gram and tri-gram class language models have been built. The perplexities of held-out test texts have been assessed, showing that tri-gram class models give lower values than those obtained with tri-gram word models, for the three languages.