Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Regular expressions into finite automata
Theoretical Computer Science
Clean up your Web pages with HP's HTML tidy
WWW7 Proceedings of the seventh international conference on World Wide Web 7
ACM SIGMOD Record
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structured Web Pages Management for Efficient Data Retrieval
WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 2 - Volume 2
A Supervised Visual Wrapper Generator for Web-Data Extraction
COMPSAC '03 Proceedings of the 27th Annual International Conference on Computer Software and Applications
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting tables in Web documents
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
This paper presents an efficient method for extracting a logical structure from a Web document. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of a specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully, compared with previous work. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.