Statistical methods for speech recognition
Statistical methods for speech recognition
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Recognition of Cursive Roman Handwriting - Past, Present and Future
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Rejection Strategies for Handwritten Word Recognition
IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
Application of syntactic properties to three-level recognition of polish hand-written medical texts
Proceedings of the 2006 ACM symposium on Document engineering
Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM'06 Conference held in Ustron, Poland, June 19-22, 2006 (Advances in Soft Computing)
Effective architecture of the polish tagger
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Correction of medical handwriting OCR based on semantic similarity
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Hi-index | 0.00 |
In the paper different methods of construction of language models are discussed in relation to a corpora of medical texts written in an inflective language, namely Polish. The main result is the proposal of a method of language modelling which sequentially combines tri-grams of morphological base forms with tri-grams of words. The introduction of base form tri-grams increased the overall performance of the combined model, measured as the improvement in the accuracy of OCR of handwriting, as well, as the ability to generalisation. The latter was showed by using corpora of two different types as the training one and the test one. The detailed results of tests run on a large corpora of real life medical language are discussed in the paper. An experimental system of OCR of handwritten epicrises utilising the proposed model is presented. The proposed language model decreases the overall error of the system by 64.2% (51% in the case of different types of corpora).