Document analysis applied to fragments: feature set for the reconstruction of torn documents
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Proceeding of the workshop on Document Analysis and Recognition
Hi-index | 0.00 |
This paper presents a novel three-module approach for underline detection and removal in Chinese/English OCR. The detection module uses strategies of connected component analysis and bottom edge analysis. The removal module uses different methods for different kinds of underlines. The disambiguation module is effected via recognition confidence comparison for reducing the risk of removing wrongly doubtful underlines. Our approach can deal with untouched, touched, broken and slightly curved underlines. In a benchmark test using single text line images extracted from UW-I database and images captured by C-Pen, we demonstrate that our approach has little negative effect on pure-text images, and can detect and remove reliably underlines in text line images with underlines.