Processing handwritten words by intelligent use of OCR results

  • Authors:
  • Benjamin Mund;Karl-Heinz Steinke

  • Affiliations:
  • University of Applied Sciences and Arts, Hanover, Hanover, Germany;University of Applied Sciences and Arts, Hanover, Hanover, Germany

  • Venue:
  • ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

About 3.5 million dried plants on paper sheets are deposited in the Botanical Museum Berlin in Germany. Frequently they have handwritten annotations (see figure 1). So a procedure had to be developed in order to process the handwriting on the sheet. In the present work an approach tries to identify the writer by handwritten words and to read handwritten keywords. Therefore the word is cut out and transformed into a 6-dimensional time series and compared e.g. by means of DTW-method. A recognition rate of 98.6% is achieved with 12 different words (1200 samples). All herbar documents contain several printed tokens which indicate more information about the plant. With the token it is possible to get information who has found this plant, where this plant was found (country and sometimes the town), what kind of plant it is and so on. By using the local connections of the text it is possible to get more information from the herbar document, e.g. to find and recognize handwritten text in a defined area.