CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Knowledge-based metadata extraction from PostScript files
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Scraping the ACM Digital Library
ACM SIGIR Forum
Bibliographic attribute extraction from erroneous references based on a statistical model
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Automatic Extraction of Reference Linking Information from Online Documents
Automatic Extraction of Reference Linking Information from Online Documents
A Segmentation Method for Bibliographic References by Contextual Tagging of Fields
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Resolving citations in a paper repository
ACM SIGKDD Explorations Newsletter
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Proceedings of the 3rd international conference on Knowledge capture
Automatic classification of citation function
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Comparing citation contexts for information retrieval
Proceedings of the 17th ACM conference on Information and knowledge management
Using terms from citations for IR: some first results
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Estimating the birth and death years of authors of undated documents using undated citations
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Parsing citations in biomedical articles using conditional random fields
Computers in Biology and Medicine
Automatic question generation for literature review writing support
ITS'10 Proceedings of the 10th international conference on Intelligent Tutoring Systems - Volume Part I
Hi-index | 0.00 |
Citations play an essential role in navigating academic literature and following chains of evidence in research. With the growing availability of large digital archives of scientific papers, the automated extraction and analysis of citations is becoming increasingly relevant. However, existing approaches to citation extraction still fall short of the high accuracy required to build more sophisticated and reliable tools for citation analysis and corpus navigation. In this paper, we present techniques for high accuracy extraction of citations and references from academic papers. By collecting multiple sources of evidence about entities from documents, and integrating citation extraction, reference segmentation, and citation-reference matching, we are able to significantly improve performance in subtasks including citation identification, author named entity recognition, and citation-reference matching. Applying our algorithm to previously-unseen documents, we demonstrate high F-measure performance of 0.980 for citation extraction, 0.983 for author named entity recognition, and 0.948 for citation-reference matching.