Evaluation of a User-Assisted Archive Construction System for Online Natural History Archives

  • Authors:
  • J. He;A. C. Downton

  • Affiliations:
  • University of Essex, UK.;University of Essex, UK.

  • Venue:
  • ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The creation of structured digital libraries from paperbased archives is an area of growing demand in many scientific and cultural fields, and is not satisfied either by off-the-shelf OCR or commercial form-processing systems. This paper describes and evaluates a configurable archive construction system, which integrates document image pre-processing and analysis with text post-processing tools and a standard OCR package. The prototype system is currently being used in conjunction with the UK Natural History Museum to help convert more than 500,000 cards of Lepidoptera and Coleoptera to a searchable digital archive. Evaluation results are summarised for two datasets comprising over 5,000 cards selected from different parts of this database, and indicate that overall end-to-end word recognition rates of 70-90% are readily achievable for key data fields, subject to availability of suitable electronic dictionaries.