An improved hidden Markov model for literature metadata extraction

Authors:
Bin-Ge Cui;Xin Chen
Affiliations:
College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, China;College of Information Science and Engineering, Shandong University of Science and Technology, Qingdao, China
Venue:
ICIC'10 Proceedings of the 6th international conference on Advanced intelligent computing theories and applications: intelligent computing
Year:
2010

Citing 2
Cited 0

CiteSeer: an automatic citation indexing system

Proceedings of the third ACM conference on Digital libraries
Scientific literature metadata extraction based on HMM

CDVE'09 Proceedings of the 6th international conference on Cooperative design, visualization, and engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we proposed an improved Hidden Markov Model (HMM) to extract metadata in the academic literatures. We have built a dataset including 458 literatures from the VLDB conferences, which contains the visual feature of text blocks. Our approach is based on the assumption that the text blocks in the same line have the same state (information type). The assumption is effective in more than 98% occasions. Thus, the state transition probability among the same states in the same line is much larger than that in different lines. According to this conclusion, we add one state transition matrix for HMM and modified the Viterbi algorithm. The experiments show that our extraction accuracy is superior to that of any existing works.