Latex: a document preparation system
Latex: a document preparation system
Classification of newspaper image blocks using texture analysis
Computer Vision, Graphics, and Image Processing
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Customizing information capture and access
ACM Transactions on Information Systems (TOIS)
Structural Queries in Electronic Corpora
Multimedia Tools and Applications
Structural Queries in Electronic Corpora
Multimedia Tools and Applications
Structure analysis and generation for internet documents
Intelligent exploration of the web
Logical Structure Analysis and Generation for Structured Documents: A Syntactic Approach
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.98 |
We present and analyze algorithms for the automated segmentation and classification of layout structures in electronic documents. The key idea is to use the patterns in the distribution of white space in a document to recognize and interpret its components. The segmentation algorithm divides the document into a hierarchy of logical elements; the classification algorithms classify these divisions as base-text, tables, indented lists, polygonal drawings, and graphs. We present experimental data and discuss an information access application. Our methodology allows the automatic markup of documents (for instance in the sgml format) and the creation of multilevel indices and browsing tools for electronic libraries.