Recognition-based digitalization of korean historical archives

  • Authors:
  • Min Soo Kim;Sungho Ryu;Kyu Tae Cho;Taik Heon Rhee;Hyun Il Choi;Jin Hyung Kim

  • Affiliations:
  • AIPR Lab., CS Div., Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea;AIPR Lab., CS Div., Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea;AIPR Lab., CS Div., Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea;AIPR Lab., CS Div., Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea;AIPR Lab., CS Div., Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea;AIPR Lab., CS Div., Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

  • Venue:
  • AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a recognition-based digitization method for building digital library of large amount of historical archives. Because the most of archives are manually transcribed in ancient Chinese characters, their digitization present unique academic and pragmatic challenges. By integrating the layout analysis and the recognition into single probabilistic framework, our system achieved 95.1% character recognition rates on test data set, despite the obsolete characters and unique variants used in the archives. Compared with intuitive verification and correction interface, the system freed the operators from repetitive typing tasks and improved the overall throughput significantly.