Structure analysis and generation for internet documents

  • Authors:
  • Kyong Ho Lee;Yoon Chul Choy;Sung-Bae Cho

  • Affiliations:
  • National Institute of Standards and Technology, Gaithersburg, MD;Dept. of Computer Science, Yonsei University, Seoul, 120-749, Korea;Dept. of Computer Science, Yonsei University, Seoul, 120-749, Korea

  • Venue:
  • Intelligent exploration of the web
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a syntactic method for logical structure analysis and generation for creation of Web documents. The method transforms document images with multiple pages and hierarchical structure into an XML document. To produce a logical structure more accurately and quickly than previous works of which the basic units are text lines, the proposed method takes text regions with hierarchical structure as input. Furthermore, we define a document model that is able to describe geometric characteristics and logical structure information of document class efficiently. Experimental results with 372 images scanned from the technical journal show that the method has performed logical structure analysis successfully. Particularly, the method generates XML documents as the result of structural analysis, so that it enhances the reusability of documents and independence of platform.