Automatic localization of page segmentation errors
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Automatic localization and correction of line segmentation errors
Proceeding of the workshop on Document Analysis and Recognition
Multilingual OCR research and applications: an overview
Proceedings of the 4th International Workshop on Multilingual OCR
Hi-index | 0.00 |
Document image segmentation algorithms primarily aim at separating text and graphics in presence of complex lay- outs. However, for many non-Latin scripts, segmentation becomes a challenge due to the characteristics of the script. In this paper, we empirically demonstrate that successful al- gorithms for Latin scripts may not be very effective for Indic and complex scripts. We explain this based on the differ- ences in the spatial distribution of symbols in the scripts. We argue that the visual information used for segmenta- tion needs to be enhanced with other information like script models for accurate results.