Learning the lexicon from raw texts for open-vocabulary Korean word recognition

Authors:
Sungho Ryu;Jin Hyung Kim
Affiliations:
-;-
Venue:
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Year:
2003

Citing 4
Cited 0

Building probabilistic models for natural language

Building probabilistic models for natural language
An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Foundations of Computational Linguistics: Man-Machine Communication in Natural Language

Foundations of Computational Linguistics: Man-Machine Communication in Natural Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel method of building alanguage model for open-vocabulary Korean wordrecognition. Due to the complex morphology of Korean, itis inappropriate to use lexicons based on the linguisticentities such as words and morphemes in open-vocabularydomains. Instead, we build the lexicon bycollecting variable length character sequences from theraw texts using a dynamic Bayesian network model of thelanguage.In simulated word recognition experiments, theproposed language model could find correct words fromlattices of character candidates in 94.3% of cases,increasing the word recognition rates by 20.9%.