Word and Sentence Extraction Using Irregular Pyramid
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Making Documents Work: Challenges for Document Understanding
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
A holistic methodology for keyword search in historical typewritten documents
SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
Abstract: We introduce a technique based on diagonal white runs and vertical edges, that divides a document image into columns and blocks which are subsequently classified as text or graphics. A diagonal white run (drun) is a set of adjacent white pixels that are diagonally connected, and a vertical edge consists of the white area between two consecutive druns. This technique was designed as a layout independent approach. Testing the proposed approach on document images with 14 different types and layouts, written in different languages, shows comparative and promising results.