CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Scientific literature metadata extraction based on HMM
CDVE'09 Proceedings of the 6th international conference on Cooperative design, visualization, and engineering
Hi-index | 0.00 |
In this paper, we proposed an improved Hidden Markov Model (HMM) to extract metadata in the academic literatures. We have built a dataset including 458 literatures from the VLDB conferences, which contains the visual feature of text blocks. Our approach is based on the assumption that the text blocks in the same line have the same state (information type). The assumption is effective in more than 98% occasions. Thus, the state transition probability among the same states in the same line is much larger than that in different lines. According to this conclusion, we add one state transition matrix for HMM and modified the Viterbi algorithm. The experiments show that our extraction accuracy is superior to that of any existing works.