On the Recognition of Printed Characters of Any Font and Size
IEEE Transactions on Pattern Analysis and Machine Intelligence
Duplicate record identification in bibliographic databases
Information Systems
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Citation linking: improving access to online journals
DL '97 Proceedings of the second ACM international conference on Digital libraries
CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Proceedings of the third annual conference on Autonomous Agents
Statistical Analysis of Bibliographic Strings for Constructing an Integrated Document Space
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Bibliography references validation using emergent architecture
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
DVHMM: Variable Length Text Recognition Error Model
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
Generating fuzzy semantic metadata describing spatial relations from images using the R-histogram
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Rule-based word clustering for document metadata extraction
Proceedings of the 2005 ACM symposium on Applied computing
Information extraction from research papers using conditional random fields
Information Processing and Management: an International Journal
Quality enhancement in information extraction from scanned documents
Proceedings of the 2006 ACM symposium on Document engineering
An approximate multi-word matching algorithm for robust document retrieval
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Reference metadata extraction using a hierarchical knowledge representation framework
Decision Support Systems
Comparisons of sequence labeling algorithms and extensions
Proceedings of the 24th international conference on Machine learning
International Journal of Metadata, Semantics and Ontologies
Automatic metadata extraction from museum specimen labels
DCMI '08 Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications
CEBBIP: a parser of bibliographic information in chinese electronic books
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Using web resources for support of online-browsing of research papers
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
A citation-based approach to automatic topical indexing of scientific literature
Journal of Information Science
Evidence-based information extraction for high accuracy citation and author name identification
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Parsing citations in biomedical articles using conditional random fields
Computers in Biology and Medicine
Unsupervised segmentation of bibliographic elements with latent permutations
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Semi-supervised bibliographic element segmentation with latent permutations
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
A sequence labeling method using syntactical and textual patterns for record linkage
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Web-based citation parsing, correction and augmentation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Assessing quality dynamics in unsupervised metadata extraction for digital libraries
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Improved bibliographic reference parsing based on repeated patterns
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Logical Structure Recovery in Scholarly Articles with Rich Document Features
International Journal of Digital Library Systems
Unsupervised Segmentation of Bibliographic Elements with Latent Permutations
International Journal of Organizational and Collective Intelligence
Hi-index | 0.00 |
In this paper, we propose a method for extracting bibliographic attributes from reference strings captured using Optical Character Recognition (OCR) and an extended hidden Markov model. Bibliographic attribute extraction can be used in two ways. One is reference parsing in which attribute values are extracted from OCR-processed references for bibliographic matching. The other is reference alignment in which attribute values are aligned to the bibliographic record to enrich the vocabulary of the bibliographic database. In this paper, we first propose a statistical model for attribute extraction that represents both the syntactical structure of references and OCR error patterns. Then, we perform experiments using bibliographic references obtained from scanned images of papers in journals and transactions and show that useful attribute values are extracted from OCR-processed references. We also show that the proposed model has advantages in reducing the cost of preparing training data, a critical problem in rule-based systems.