Recursive X-Y cut using bounding boxes of connected components

Authors:
Jaekyu Ha;R. M. Haralick;I. T. Phillips
Affiliations:
-;-;-
Venue:
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Year:
1995

Citing 0
Cited 23

Parameter-Free Geometric Document Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Page Classification for Meta-data Extraction from Digital Collections

DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
User-Assisted Archive Document Image Analysis for Digital Library Construction

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Using visual cues for extraction of tabular data from arbitrary HTML documents

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Optimized XY-Cut for Determining a Page Reading Order

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Evaluation of a User-Assisted Archive Construction System for Online Natural History Archives

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Combining DOM tree and geometric layout analysis for online medical journal article segmentation

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Automatic extraction of table metadata from digital documents

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
TableSeer: automatic table metadata extraction and searching in digital libraries

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Bibliography

ACM SIGGRAPH 2007 courses
Visual features in genre classification of html

Proceedings of the eighteenth conference on Hypertext and hypermedia
The fast scheme for document page segmentation in OCR using window and optimum image

CIMMACS'06 Proceedings of the 5th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics
Identifying table boundaries in digital documents via sparse line detection

Proceedings of the 17th ACM conference on Information and knowledge management
Spatial Relation Based Object Extraction from the World Wide Web

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
A multi-plane approach for text segmentation of complex document images

Pattern Recognition
An efficient pre-processing method to identify logical components from PDF documents

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Indexing and querying segmented web pages: the BlockWeb Model

World Wide Web
Associating the visual representation of user interfaces with their internal structures and metadata

Proceedings of the 24th annual ACM symposium on User interface software and technology
Advanced documents authoring tool

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
From legacy documents to XML: a conversion framework

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
SmartDCap: semi-automatic capture of higher quality document images from a smartphone

Proceedings of the 2013 international conference on Intelligent user interfaces
Predicting users' first impressions of website aesthetics with a quantification of perceived visual complexity and colorfulness

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Multilingual OCR research and applications: an overview

Proceedings of the 4th International Workshop on Multilingual OCR

Quantified Score

Hi-index	0.00

Visualization

Abstract

A top-down page segmentation technique known as the recursive X-Y cut decomposes a document image recursively into a set of rectangular blocks. This paper proposes that the recursive X-Y cut be implemented using bounding boxes of connected components of black pixels instead of using image pixels. The advantage is that great improvement can be achieved in computation. In fact, once bounding boxes of connected components are obtained, the recursive X-Y cut is completed within an order of a second on Sparc-10 workstations for letter-sized document images scanned at 900 dpi resolution.