Visualizing science by citation mapping
Journal of the American Society for Information Science
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
CEBBIP: a parser of bibliographic information in chinese electronic books
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Multi-page document analysis based on format consistency and clustering
International Journal of Computer Applications in Technology
Evidence-based information extraction for high accuracy citation and author name identification
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Web-based citation parsing, correction and augmentation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Cost effective ontology population with data from lists in OCRed historical documents
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Hi-index | 0.01 |
In this paper, a method based on part-of-speech tagging(PoS) is used for bibliographic reference structure. Thismethod operates on a roughly structured ASCII file,produced by OCR.. Because of the heterogeneity of thereference structure, the method acts in a bottom-up way,without an a priori model, gathering structural elementsfrom basic tags to sub-fields and fields. Significant tagsare first grouped in homogeneous classes according totheir grammar categories and then reduced in canonicalforms corresponding to record fields: ``authors'', "title","conference name:, "date", etc. Non labelled tokens areintegrated in one or another field by either applying PoScorrection rules or using a structure model generatedfrom well-detected records. The designed prototypeoperates with a great satisfaction on different recordlayouts and character recognition qualities. Withoutmanual intervention, 96.6% words are correctlyattributed, and about 75,9% references are completelysegmented from 2500 references.