Self-supervised learning approach for extracting citation information on the web

  • Authors:
  • Dat T. Huynh;Wen Hua

  • Affiliations:
  • School of Information Technology and Electrical Engineering, The University of Queensland, Australia;School of Information Technology and Electrical Engineering, The University of Queensland, Australia

  • Venue:
  • APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a framework for automatically training a model to extract citation information on the web. Constructing manually labeled training data to learn an extraction model is tedious, time consuming and difficult to be applied to several styles of citations with different types of entities. To eliminate the requirement of manually labeled training data, we exploit a knowledge base of citation domain and web search to derive labeled training data automatically. Our experiments show that the combination of knowledge base, heuristics and statistical methods can automate the extraction process and achieve good performance.