Document analysis system

Authors:
K. Y. Wong;R. G. Casey;F. M. Wahl
Affiliations:
IBM Research Division, San Jose, California;IBM Research Division, San Jose, California;Lst. s Nachrichtentechnich Technische Universitaet, Munich 2, Federal Republic of Germany
Venue:
IBM Journal of Research and Development
Year:
1982

Citing 4
Cited 43

Preliminary investigation of techniques for automated reading of unformatted text

Communications of the ACM
Digital Picture Processing

Digital Picture Processing
Merkmale für die Segmentation von Dokumenten zur automatischen Textverarbeitung

Modelle und Strukturen, DAGM Symposium
An Interactive System for Reading Unformatted Printed Text

IEEE Transactions on Computers

Multi-Dimensional Interval Algebra with Symmetry for Describing Block Layouts

GREC '99 Selected Papers from the Third International Workshop on Graphics Recognition, Recent Advances
Scan-to-XML: Using Software Component Algebra for Intelligent Document Generation

GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
Symbolic Learning Techniques in Paper Document Processing

MLDM '99 Proceedings of the First International Workshop on Machine Learning and Data Mining in Pattern Recognition
Feature Approach for Printed Document Image Analysis

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
The T-Recs Table Recognition and Analysis System

DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
Automatic Indexing of Newspaper Microfilm Images

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Word and Sentence Extraction Using Irregular Pyramid

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Text/Graphics Separation Revisited

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Statistical Analysis of Bibliographic Strings for Constructing an Integrated Document Space

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Document Skew Detection Using Minimum-Area Bounding Rectangle

ITCC '00 Proceedings of the The International Conference on Information Technology: Coding and Computing (ITCC'00)
An Approach to Extracting the Target Text Line from a Document Image Captured by a Pen Scanner

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Texture Feature Characterization for Logical Pre-labeling

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Text - Image Separation in Devanagari Documents

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Font Type Extraction and Character Prototyping Using Gabor Filters

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Arabic Newspaper Page Segmentation

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Analysis and Conversion of Documents

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Zone Identification in the Printed Gujarati Text

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A Comprehensive Image Processing Suite for Book Re-mastering

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A color-based layout analysis to process censorship cards of film archives

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis

Engineering Applications of Artificial Intelligence
A Figure Image Processing System

Graphics Recognition. Recent Advances and New Opportunities
Character prototyping in document images using Gabor filters

SCIA'03 Proceedings of the 13th Scandinavian conference on Image analysis
Text versus non-text distinction in online handwritten documents

Proceedings of the 2010 ACM Symposium on Applied Computing
Context-aware and content-based dynamic Voronoi page segmentation

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A histogram-based technique for automatic threshold assessment in a run length smoothing-based algorithm

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Associating figures with descriptions for patent documents

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Text detection in images using sparse representation with discriminative dictionaries

Image and Vision Computing
XML based architectures for documents comparison, categorisation, and scrutinisation

International Journal of Data Analysis Techniques and Strategies
Automatic localization of page segmentation errors

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Text line segmentation for gray scale historical document images

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Recognition of passports using FCM-based RBF network

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Recognition of english calling card by using multiresolution images and enhanced ART1-Based RBF neural networks

ISNN'06 Proceedings of the Third international conference on Advnaces in Neural Networks - Volume Part II
Recognition of passports using a hybrid intelligent system

ICIAR'05 Proceedings of the Second international conference on Image Analysis and Recognition
Word spotting in historical printed documents using shape and sequence comparisons

Pattern Recognition
Applying preattentive visual guidance in document image analysis

IWICPAS'06 Proceedings of the 2006 Advances in Machine Vision, Image Processing, and Pattern Analysis international conference on Intelligent Computing in Pattern Analysis/Synthesis
Performance comparison of six algorithms for page segmentation

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Comprehensive document representation

Mathematical and Computer Modelling: An International Journal
A graph based approach for heterogeneous document segmentation

ICISP'12 Proceedings of the 5th international conference on Image and Signal Processing
Natural language inspired approach for handwritten text line detection in legacy documents

LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Automatic localization and correction of line segmentation errors

Proceeding of the workshop on Document Analysis and Recognition
The ESPOSALLES database: An ancient marriage license corpus for off-line handwriting recognition

Pattern Recognition
Hi-Fi HTML rendering of multi-format documents in DoMinUS

Proceedings of the 2013 ACM symposium on Document engineering
Texture feature evaluation for segmentation of historical document images

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper outlines the requirements and components for a proposed Document Analysis System, which assists a user in encoding printed documents for computer processing. Several critical functions have been investigated and the technical approaches are discussed. The first is the segmentation and classification of digitized printed documents into regions of text and images. A nonlinear, run-length smoothing algorithm has been used for this purpose. By using the regular features of text lines, a linear adaptive classification scheme discriminates text regions from others. The second technique studied is an adaptive approach to the recognition of the hundreds of font styles and sizes that can occur on printed documents. A preclassifier is constructed during the input process and used to speed up a well-known pattern-matching method for clustering characters from an arbitrary print source into a small sample of prototypes. Experimental results are included.