Internet scrapbook: automating Web browsing tasks by demonstration
Proceedings of the 11th annual ACM symposium on User interface software and technology
Effective Web data extraction with standard XML technologies
Proceedings of the 10th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Extracting content from accessible web pages
W4A '05 Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A)
HTML2RSS: automatic generation of RSS feed based on structure analysis of HTML document
Proceedings of the 15th international conference on World Wide Web
An Efficient Method for Quick Construction of Web Services
Proceedings of the 2009 conference on Information Modelling and Knowledge Bases XX
Hi-index | 0.00 |
The Web is the richest source of information and knowledge. Unfortunately the current structure of Web pages makes it difficult for users to retrieve the information or knowledge in a systematic way. In this paper, using the tree approach, we propose a personal Web information/knowledge retrieval system for the extraction of structured parts from Web pages. First we get the layout pattern and paths of extraction parts of a typical Web page in target sites. Then we use the recorded layout pattern and paths to extract the structured parts from the rest of Web pages in target sites. We show the usefulness of our approach using the results of extracting structured parts of notable Web pages.