Web-based citation parsing, correction and augmentation

Authors:
Liangcai Gao;Xixi Qi;Zhi Tang;Xiaofan Lin;Ying Liu
Affiliations:
Peking University, Beijing, China;Peking University, Beijing, China;State Key Laboratory of Digital Publishing Technology, Beijing, China;A9.com, Palo Alto, CA, USA;Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Venue:
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Year:
2012

Citing 15
Cited 1

CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Bibliographic attribute extraction from erroneous references based on a statistical model

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
A Segmentation Method for Bibliographic References by Contextual Tagging of Fields

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Metadata Propagation in the Web Using Co-Citations

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Reference metadata extraction using a hierarchical knowledge representation framework

Decision Support Systems
FLUX-CIM: flexible unsupervised extraction of citation metadata

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Bibliographic Attributes Extraction with Layer-upon-Layer Tagging

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Are your citations clean?

Communications of the ACM
A simple method for citation metadata extraction using hidden markov models

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
BibPro: A Citation Parser Based on Sequence Alignment Techniques

AINAW '08 Proceedings of the 22nd International Conference on Advanced Information Networking and Applications - Workshops
Automatic metadata generation using associative networks

ACM Transactions on Information Systems (TOIS)
CEBBIP: a parser of bibliographic information in chinese electronic books

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Metadata Extraction from PDF Papers for Digital Library Ingest

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
PDFMeat: managing publications on the semantic desktop

Proceedings of the 20th ACM international conference on Information and knowledge management

Extracting and matching authors and affiliations in scholarly documents

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Considering the tremendous value of citation metadata, many methods have been proposed to automate Citation Metadata Extraction (CME). The existing methods primarily rely on the content analysis of citation text. However, the results from such content-based methods are often unreliable. Moreover, the extracted citation metadata is only a small part of the relevant metadata that spreads across the Internet. As opposed to the content-based CME methods, this paper proposes a Web-based CME approach and a citation enriching system, called as BibAll, which is capable of correcting the parsing results of content-based CME methods and augmenting citation metadata by leveraging relevant bibliographic data from digital repositories and cited-by publications on the Web. BibAll consists of four main components: citation parsing, Web-based bibliographic data retrieval, irrelevant bibliographic data filtering, and relevant bibliographic data integration. The system has been tested on the publicly available FLUX-CIM dataset. Experimental results show that BibAll significantly improves the citation parsing accuracy and augments the metadata of the original citation.