An improved hidden Markov model for literature metadata extraction

  • Authors:
  • Bin-Ge Cui;Xin Chen

  • Affiliations:
  • College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, China;College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, China

  • Venue:
  • ICIC'10 Proceedings of the 6th international conference on Advanced intelligent computing theories and applications: intelligent computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we proposed an improved Hidden Markov Model (HMM) to extract metadata in the academic literatures. We have built a dataset including 458 literatures from the VLDB conferences, which contains the visual feature of text blocks. Our approach is based on the assumption that the text blocks in the same line have the same state (information type). The assumption is effective in more than 98% occasions. Thus, the state transition probability among the same states in the same line is much larger than that in different lines. According to this conclusion, we add one state transition matrix for HMM and modified the Viterbi algorithm. The experiments show that our extraction accuracy is superior to that of any existing works.