An Introduction to Digital Image Processing
An Introduction to Digital Image Processing
Recursive X-Y cut using bounding boxes of connected components
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
A Comparison of Binarization Methods for Historical Archive Documents
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Word Spotting in Archive Documents Using Shape Contexts
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
Hi-index | 0.00 |
The creation of structured digital libraries from paperbased archives is an area of growing demand in many scientific and cultural fields, and is not satisfied either by off-the-shelf OCR or commercial form-processing systems. This paper describes and evaluates a configurable archive construction system, which integrates document image pre-processing and analysis with text post-processing tools and a standard OCR package. The prototype system is currently being used in conjunction with the UK Natural History Museum to help convert more than 500,000 cards of Lepidoptera and Coleoptera to a searchable digital archive. Evaluation results are summarised for two datasets comprising over 5,000 cards selected from different parts of this database, and indicate that overall end-to-end word recognition rates of 70-90% are readily achievable for key data fields, subject to availability of suitable electronic dictionaries.