Extraction of Type Style Based Meta-Information from Imaged Documents

Authors:
U. Garain;B. B. Chaudhuri
Affiliations:
-;-
Venue:
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Year:
1999

Citing 0
Cited 1

Web page title extraction and its application

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extraction of some meta-information from printed documents without OCR approach is considered. It can be statistically verified that important terms in articles are printed in italic, bold and all capital style. Detection of these type styles helps in automatic extraction of the lines containing titles, authors' names, subtitles, references as well as sentences having important terms occurring in the text. It also helps in improving the OCR performance for reading the italicized text. Some experimental results on the performance of the approach on good quality as well as degraded document images are presented.