BibPro: A Citation Parser Based on Sequence Alignment Techniques

  • Authors:
  • Chien-Chih Chen;Kai-Hsiang Yang;Hung-Yu Kao;Jan-Ming Ho

  • Affiliations:
  • -;-;-;-

  • Venue:
  • AINAW '08 Proceedings of the 22nd International Conference on Advanced Information Networking and Applications - Workshops
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The dramatic increase in the number of academic publications has led to a growing demand for efficient organization of the resources to meet researchers’ specific needs. As a result, a number of network services have compiled databases from the public resources scattered over the Internet. However, publications in different conferences and journals follow different citation formats, so the problem of accurately extracting metadata from a publication string has also attracted a great deal of attention in recent years. In this paper, we extend our previous work to propose a new tool called BibPro for extracting metadata from citation strings by using a gene sequence alignment tool. The main enhancement of BibPro to our previously tool is that BibPro does not need knowledge databases (e.g., an author name database) to generate feature indices for citation strings. Instead, only the order of punctuation marks in a citation string is used to represent its format. Second, BibPro employs the Basic Local Alignment Search Tool (BLAST) to find the most similar citation formats in database and then uses the Needleman-Wunsch algorithm to choose the best-fit citation format as the extraction template. Our experimental results show that, in terms of precision and recall, BibPro outperforms other existent systems (e.g., INFOMAP and ParaCite), and BibPro can scale well.