A sequence labeling method using syntactical and textual patterns for record linkage

  • Authors:
  • Atsuhiro Takasu

  • Affiliations:
  • National Institute of Informatics, Tokyo, Japan

  • Venue:
  • ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Record linkage is an important application area of text pattern analysis. In this paper we propose a new sequence labeling method that can be used to extract entities from a string for record linkage. The proposed method combines a classifier and a Hidden Markov Model (HMM) to utilize both syntactical and textual information from the string. We first describe the model used in the proposed method and then discuss the parameter estimation for this model. The proposed method incorporates a classifier for handling textual information and integrates the classifier with the HMM statistically by estimating the error probability of the classifier. We applied the proposed method to the bibliographic sequence labeling problem, in which bibliographic components are extracted from reference strings. We compared the proposed method with other methods that use textual or syntactical information alone and showed that the proposed method outperforms them.