Extraction of Indicative Summary Sentences from Imaged Documents

Authors:
Francine R. Chen;Dan S. Bloomberg
Affiliations:
-;-
Venue:
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Year:
1997

Citing 0
Cited 4

Imaged Document Text Retrieval Without OCR

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval from Documents: A Survey

Information Retrieval
Restoration of Decorative Headline Images for Document Retrieval

DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
Multimedia thumbnails for documents

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

A system for selecting sentences from an imaged document for presentation as part of a document summary is presented. The extracts are identified without the use of optical character recognition. The sentences are selected based on a set of discrete features characterizing the words within a sentence and the location of the sentence within the imaged document. Each sentence is scored based on the values of the discrete features using a statistically based classifier. The imaged document is processed to identify the word locations, the reading order of words, and the location of sentence and paragraph boundaries in the text. The words are grouped into equivalence classes to mimic the terms in a text document. A sample extract for a technical document is shown, and evaluation against a set of abstracts created by a professional abstracting company is given. These results are compared with text-based abstracts.