Automatic Indexing of Newspaper Microfilm Images

Authors:
Qing H. Liu;Chew Lim Tan
Affiliations:
-;-
Venue:
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Year:
2002

Citing 6
Cited 0

A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Binarization and multithresholding of document images using connectivity

CVGIP: Graphical Models and Image Processing
An Introduction to Digital Image Processing

An Introduction to Digital Image Processing
Algorithms for Graphics and Imag

Algorithms for Graphics and Imag
Character Extraction from Noisy Background for an Automatic Reference System

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Document analysis system

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a proposed document analysis system that aims at automatic indexing of digitized images of old newspaper microfilms. This is done by extracting news headlines from microfilm images. The headlines are then converted to machine readable text by OCR to serve as indices to the respective news articles. A major challenge to us is the poor image quality of the microfilm as most images are usually inadequately illuminated and considerably dirty. To overcome the problem we propose a new effective method for separating characters from noisy background since conventional threshold selection techniques are inadequate to deal with these kinds of images. A Run Length Smearing Algorithm (RLSA) is then applied to the headline extraction. Experimental results confirm the validity of the approach.