Metadata Extraction from Chinese Research Papers Based on Conditional Random Fields

Authors:
Jiangde Yu;Xiaozhong Fan
Affiliations:
Beijing Institute of Technology, Beijing;Beijing Institute of Technology, Beijing
Venue:
FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 01
Year:
2007

Citing 0
Cited 2

A hybrid two-stage approach for discipline-independent canonical representation extraction from references

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Unstructured data extraction of Chinese expert web page

International Journal of Wireless and Mobile Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the appearance of more and more research papers on the Internet, it becomes more and more important to accurately extract the metadata from paper header and citation of research papers. In this paper, a method based on Conditional Random Fields (CRFs) is proposed for automatic extraction of metadata from Chinese research papers. The key of this algorithm is parameter estimation and feature selection. We employ L-BFGS algorithm for parameter estimation. We analyze three classes of features and perform feature induction. In the processing the method makes use of the format information of list separators and special-labels to segment text, and then combines CRFs for metadata extraction from papers. We compare the performance of the metadata extracting on English and Chinese datasets using CRFs, also compare the performance of the different model: CRFs and hidden Markov model (HMM) on Chinese datasets. Experimental results show that CRFs perform better than HMM.