Experience with a stack decoder-based HMM CSR and back-OFF N-gram language models
HLT '91 Proceedings of the workshop on Speech and Natural Language
On the interaction between true source, training, and testing language models
HLT '90 Proceedings of the workshop on Speech and Natural Language
Evaluating natural language processing systems
Communications of the ACM
A freely available wide coverage morphological analyzer for English
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
A freely available wide coverage morphological analyzer for English
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
The design for the wall street journal-based CSR corpus
HLT '91 Proceedings of the workshop on Speech and Natural Language
Applying SPHINX-II to the DARPA Wall Street Journal CSR task
HLT '91 Proceedings of the workshop on Speech and Natural Language
Hi-index | 0.02 |
There has been a recent upsurge of interest in computational studies of large bodies of text. The aim of such studies varies widely, from lexicography and studies of language change to automatic indexing methods and statistical models for improving the performance of speech recognition systems and optical character readers. In general, corpus-based studies are critical for the development of adequate models of linguistic structure and for insights into the nature of language use. However, research workers have been severely hampered by the lack of appropriate materials, and specifically by the lack of a large enough body of text on which published results can be replicated or extended by others.