Off-Line Cursive Script Word Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence
Surface shape and curvature scales
Image and Vision Computing
A Survey of Methods and Strategies in Character Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Word spotting: indexing handwritten manuscripts
Intelligent multimedia information retrieval
An Off-Line Cursive Handwriting Recognition System
IEEE Transactions on Pattern Analysis and Machine Intelligence
Twenty Years of Document Image Analysis in PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence
Scale-Space Theory in Computer Vision
Scale-Space Theory in Computer Vision
Segmentation of the Date in Entries of Historical Church Registers
Proceedings of the 24th DAGM Symposium on Pattern Recognition
Word Spotting: A New Approach to Indexing Handwriting
CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Document page decomposition by the bounding-box project
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
A Full English Sentence Database for Off-Line Handwriting Recognition
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Gap metrics for word separation in handwritten lines
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Transcript Mapping for Historic Handwritten Document Images
IWFHR '02 Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR'02)
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Fast Handwriting Recognition for Indexing Historical Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Holistic Word Recognition for Handwritten Historical Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
A search engine for historical manuscript images
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Text search for medieval manuscript images
Pattern Recognition
Text line detection in handwritten documents
Pattern Recognition
Local Orientation Extraction for Wordspotting in Syriac Manuscripts
ICISP '08 Proceedings of the 3rd international conference on Image and Signal Processing
Pattern Recognition Methods for Querying and Browsing Technical Documentation
CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
Towards an omnilingual word retrieval system for ancient manuscripts
Pattern Recognition
A method for combining complementary techniques for document image segmentation
Pattern Recognition
Text line and word segmentation of handwritten documents
Pattern Recognition
A method for combining complementary techniques for document image segmentation
Pattern Recognition
Handwritten document image segmentation into text lines and words
Pattern Recognition
ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Ground truth creation for handwriting recognition in historical documents
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Indexation of Syriac manuscripts using directional features
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Quasi-random nonlinear scale space
Pattern Recognition Letters
Generalized probabilistic scale space for image restoration
IEEE Transactions on Image Processing - Special section on distributed camera networks: sensing, processing, communication, and implementation
A new scheme for unconstrained handwritten text-line segmentation
Pattern Recognition
Integrated Computer-Aided Engineering
IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
A holistic methodology for keyword search in historical typewritten documents
SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Lexicon-free handwritten word spotting using character HMMs
Pattern Recognition Letters
Aligning transcripts to automatically segmented handwritten manuscripts
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
A few steps towards on-the-fly symbol recognition with relevance feedback
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Scale selection for supervised image segmentation
Image and Vision Computing
Text line extraction for historical document images
Pattern Recognition Letters
Journal on Computing and Cultural Heritage (JOCCH)
Hi-index | 0.14 |
Many libraries, museums, and other organizations contain large collections of handwritten historical documents, for example, the papers of early presidents like George Washington at the Library of Congress. The first step in providing recognition/retrieval tools is to automatically segment handwritten pages into words. State of the art segmentation techniques like the gap metrics algorithm have been mostly developed and tested on highly constrained documents like bank checks and postal addresses. There has been little work on full handwritten pages and this work has usually involved testing on clean artificial documents created for the purpose of research. Historical manuscript images, on the other hand, contain a great deal of noise and are much more challenging. Here, a novel scale space algorithm for automatically segmenting handwritten (historical) documents into words is described. First, the page is cleaned to remove margins. This is followed by a gray-level projection profile algorithm for finding lines in images. Each line image is then filtered with an anisotropic Laplacian at several scales. This procedure produces blobs which correspond to portions of characters at small scales and to words at larger scales. Crucial to the algorithm is scale selection, that is, finding the optimum scale at which blobs correspond to words. This is done by finding the maximum over scale of the extent or area of the blobs. This scale maximum is estimated using three different approaches. The blobs recovered at the optimum scale are then bounded with a rectangular box to recover the words. A postprocessing filtering step is performed to eliminate boxes of unusual size which are unlikely to correspond to words. The approach is tested on a number of different data sets and it is shown that, on 100 sampled documents from the George Washington corpus of handwritten document images, a total error rate of 17 percent is observed. The technique outperforms a state-of-the-art gap metrics word-segmentation algorithm on this collection.