Parameter-Free Geometric Document Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Page Classification for Meta-data Extraction from Digital Collections
DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
User-Assisted Archive Document Image Analysis for Digital Library Construction
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Using visual cues for extraction of tabular data from arbitrary HTML documents
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Optimized XY-Cut for Determining a Page Reading Order
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Evaluation of a User-Assisted Archive Construction System for Online Natural History Archives
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Combining DOM tree and geometric layout analysis for online medical journal article segmentation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Automatic extraction of table metadata from digital documents
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
TableSeer: automatic table metadata extraction and searching in digital libraries
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
ACM SIGGRAPH 2007 courses
Visual features in genre classification of html
Proceedings of the eighteenth conference on Hypertext and hypermedia
The fast scheme for document page segmentation in OCR using window and optimum image
CIMMACS'06 Proceedings of the 5th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics
Identifying table boundaries in digital documents via sparse line detection
Proceedings of the 17th ACM conference on Information and knowledge management
Spatial Relation Based Object Extraction from the World Wide Web
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
A multi-plane approach for text segmentation of complex document images
Pattern Recognition
An efficient pre-processing method to identify logical components from PDF documents
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Associating the visual representation of user interfaces with their internal structures and metadata
Proceedings of the 24th annual ACM symposium on User interface software and technology
Advanced documents authoring tool
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
From legacy documents to XML: a conversion framework
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
SmartDCap: semi-automatic capture of higher quality document images from a smartphone
Proceedings of the 2013 international conference on Intelligent user interfaces
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Multilingual OCR research and applications: an overview
Proceedings of the 4th International Workshop on Multilingual OCR
Hi-index | 0.00 |
A top-down page segmentation technique known as the recursive X-Y cut decomposes a document image recursively into a set of rectangular blocks. This paper proposes that the recursive X-Y cut be implemented using bounding boxes of connected components of black pixels instead of using image pixels. The advantage is that great improvement can be achieved in computation. In fact, once bounding boxes of connected components are obtained, the recursive X-Y cut is completed within an order of a second on Sparc-10 workstations for letter-sized document images scanned at 900 dpi resolution.