Web-based citation parsing, correction and augmentation

  • Authors:
  • Liangcai Gao;Xixi Qi;Zhi Tang;Xiaofan Lin;Ying Liu

  • Affiliations:
  • Peking University, Beijing, China;Peking University, Beijing, China;State Key Laboratory of Digital Publishing Technology, Beijing, China;A9.com, Palo Alto, CA, USA;Korea Advanced Institute of Science and Technology, Daejeon, South Korea

  • Venue:
  • Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Considering the tremendous value of citation metadata, many methods have been proposed to automate Citation Metadata Extraction (CME). The existing methods primarily rely on the content analysis of citation text. However, the results from such content-based methods are often unreliable. Moreover, the extracted citation metadata is only a small part of the relevant metadata that spreads across the Internet. As opposed to the content-based CME methods, this paper proposes a Web-based CME approach and a citation enriching system, called as BibAll, which is capable of correcting the parsing results of content-based CME methods and augmenting citation metadata by leveraging relevant bibliographic data from digital repositories and cited-by publications on the Web. BibAll consists of four main components: citation parsing, Web-based bibliographic data retrieval, irrelevant bibliographic data filtering, and relevant bibliographic data integration. The system has been tested on the publicly available FLUX-CIM dataset. Experimental results show that BibAll significantly improves the citation parsing accuracy and augments the metadata of the original citation.