Learning the lexicon from raw texts for open-vocabulary Korean word recognition

  • Authors:
  • Sungho Ryu;Jin Hyung Kim

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a novel method of building alanguage model for open-vocabulary Korean wordrecognition. Due to the complex morphology of Korean, itis inappropriate to use lexicons based on the linguisticentities such as words and morphemes in open-vocabularydomains. Instead, we build the lexicon bycollecting variable length character sequences from theraw texts using a dynamic Bayesian network model of thelanguage.In simulated word recognition experiments, theproposed language model could find correct words fromlattices of character candidates in 94.3% of cases,increasing the word recognition rates by 20.9%.