FireCite: lightweight real-time reference string extraction from webpages

Authors:
Ching Hoi Andy Hong;Jesse Prabawa Gozali;Min-Yen Kan
Affiliations:
National University of Singapore;National University of Singapore;National University of Singapore
Venue:
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Year:
2009

Citing 6
Cited 2

Automatic information extraction from semi-structured Web pages by pattern discovery

Decision Support Systems - Web retrieval and mining
Web data extraction based on partial tree alignment

WWW '05 Proceedings of the 14th international conference on World Wide Web
Simultaneous record detection and attribute labeling in web data extraction

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
FLUX-CIM: flexible unsupervised extraction of citation metadata

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A simple method for citation metadata extraction using hidden markov models

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Academic conference homepage understanding using constrained hierarchical conditional random fields

Proceedings of the 17th ACM conference on Information and knowledge management

A hybrid two-stage approach for discipline-independent canonical representation extraction from references

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Mining Publication Records on Personal Publication Web Pages Based on Conditional Random Fields

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present FireCite, a Mozilla Firefox browser extension that helps scholars assess and manage scholarly references on the web by automatically detecting and parsing such reference strings in real-time. FireCite has two main components: 1) a reference string recognizer that has a high recall of 96%, and 2) a reference string parser that can process HTML web pages with an overall F1 of 878 and plaintext reference strings with an overall F1 of 97. In our preliminary evaluation, we presented our FireCite prototype to four academics in separate unstructured interviews. Their positive feedback gives evidence to the desirability of FireCite's citation management capabilities.