A Segmentation Method for Bibliographic References by Contextual Tagging of Fields

  • Authors:
  • Dominique Besagni;Abdel Belaïd;Nelly Benet

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, a method based on part-of-speech tagging(PoS) is used for bibliographic reference structure. Thismethod operates on a roughly structured ASCII file,produced by OCR.. Because of the heterogeneity of thereference structure, the method acts in a bottom-up way,without an a priori model, gathering structural elementsfrom basic tags to sub-fields and fields. Significant tagsare first grouped in homogeneous classes according totheir grammar categories and then reduced in canonicalforms corresponding to record fields: ``authors'', "title","conference name:, "date", etc. Non labelled tokens areintegrated in one or another field by either applying PoScorrection rules or using a structure model generatedfrom well-detected records. The designed prototypeoperates with a great satisfaction on different recordlayouts and character recognition qualities. Withoutmanual intervention, 96.6% words are correctlyattributed, and about 75,9% references are completelysegmented from 2500 references.