Incorporating Language Syntax in Visual Text Recognition with a Statistical Model
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document Representation and Its Application to Page Decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Enhancement and Restoration of Digital Documents: Statistical Design of Nonlinear Algorithms
Enhancement and Restoration of Digital Documents: Statistical Design of Nonlinear Algorithms
Restoration of Archival Documents Using a Wavelet Technique
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Segmentation and Identification of Handwriting in Noisy Document Images
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Adaptive Hindi OCR using generalized Hausdorff image comparison
ACM Transactions on Asian Language Information Processing (TALIP)
A Robust Algorithm for Text Detection in Color Images
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Figure-ground segmentation using factor graphs
Image and Vision Computing
Confusion network based video OCR post-processing approach
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Grouping using factor graphs: an approach for finding text with a camera phone
GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition
Hi-index | 0.00 |
In this paper we address the problem of the identificationof text from noisy documents. We segment and identifyhandwriting from machine printed text because 1) handwritingin a document often indicates corrections, additionsor other supplemental information that should be treateddifferently from the main or body content, and 2) the segmentationand recognition techniques for machine printedtext and handwriting are significantly different. Our noveltyis that we treat noise as a separate class and model noisebased on selected features. Trained Fisher classifiers areused to identify machine printed text and handwriting fromnoise. We further exploit context to refine the classification.A Markov Random Field (MRF) based approach is used tomodel the geometrical structure of the printed text, handwritingand noise to rectify the mis-classification. Experimentalresults show our approach is promising and robust,and can significantly improve the page segmentation resultsin noise documents.