Statistical Analysis of Bibliographic Strings for Constructing an Integrated Document Space

  • Authors:
  • Atsuhiro Takasu

  • Affiliations:
  • -

  • Venue:
  • ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is important to utilize retrospective documents when constructing a large digital library. This paper proposes a method for analyzing recognized bibliographic strings using an extended hidden Markov model. The proposed method enables analysis of erroneous bibliographic strings and integrates many documents accumulated as printed articles in a citation index. The proposed method has the advantage of providing a robust bibliographic matching function using the statistical description of the syntax of bibliographic strings, a language model and an Optical Character Recognition (OCR) error model. The method also has the advantage of reducing the cost of preparing training data for parameter estimation, using records in the bibliographic database.