CEBBIP: a parser of bibliographic information in chinese electronic books
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Structure extraction from PDF-based book documents
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Web-based citation parsing, correction and augmentation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Improved bibliographic reference parsing based on repeated patterns
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Hi-index | 0.00 |
The dramatic increase in the number of academic publications has led to a growing demand for efficient organization of the resources to meet researchers’ specific needs. As a result, a number of network services have compiled databases from the public resources scattered over the Internet. However, publications in different conferences and journals follow different citation formats, so the problem of accurately extracting metadata from a publication string has also attracted a great deal of attention in recent years. In this paper, we extend our previous work to propose a new tool called BibPro for extracting metadata from citation strings by using a gene sequence alignment tool. The main enhancement of BibPro to our previously tool is that BibPro does not need knowledge databases (e.g., an author name database) to generate feature indices for citation strings. Instead, only the order of punctuation marks in a citation string is used to represent its format. Second, BibPro employs the Basic Local Alignment Search Tool (BLAST) to find the most similar citation formats in database and then uses the Needleman-Wunsch algorithm to choose the best-fit citation format as the extraction template. Our experimental results show that, in terms of precision and recall, BibPro outperforms other existent systems (e.g., INFOMAP and ParaCite), and BibPro can scale well.