Digital Libraries and Document Image Analysis

  • Authors:
  • Henry S. Baird

  • Affiliations:
  • -

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The rapid growth of digital libraries (DLs) worldwideposes many new challenges for document image analysis(DIA) research and development. DLs promise to offermore people access to larger document collections, and atfar greater speed, than physical libraries can. But DLsalso tend, for many reasons, to serve poorly, or even toomit entirely, many types of non-digital human-legible media,such as originally printed and handwritten documents.These media, in their original physical (undigitized) form,are readily - if not always quickly - legible, searchable,and browseable, whereas in the form of document imagesaccessed through DLs they often lose many of their originaladvantages while of course lacking many advantagesof symbolically encoded information. The author exploresthese issues and illustrates them with brief case studies arisingfrom his experience as a DIA researcher in collaborationwith several DL projects in the US. Difficult open DIAtechnical problems in DL applications are identified in thecontrasting advantages of paper and digital displays, at everystage of capture, early processing, recognition, analysis,presentation, & retrieval, and in personal and interactiveapplications. These support the conclusion that theinternational DIA R&D community is urgently needed (becauseuniquely qualified) to provide new technology to helprescue from neglect - even, in many cases, eventual oblivion-the world's vast culturally irreplaceable legacy paperdocument collections.