Metadata Extraction from Chinese Research Papers Based on Conditional Random Fields

  • Authors:
  • Jiangde Yu;Xiaozhong Fan

  • Affiliations:
  • Beijing Institute of Technology, Beijing;Beijing Institute of Technology, Beijing

  • Venue:
  • FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 01
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the appearance of more and more research papers on the Internet, it becomes more and more important to accurately extract the metadata from paper header and citation of research papers. In this paper, a method based on Conditional Random Fields (CRFs) is proposed for automatic extraction of metadata from Chinese research papers. The key of this algorithm is parameter estimation and feature selection. We employ L-BFGS algorithm for parameter estimation. We analyze three classes of features and perform feature induction. In the processing the method makes use of the format information of list separators and special-labels to segment text, and then combines CRFs for metadata extraction from papers. We compare the performance of the metadata extracting on English and Chinese datasets using CRFs, also compare the performance of the different model: CRFs and hidden Markov model (HMM) on Chinese datasets. Experimental results show that CRFs perform better than HMM.