An Informative DOM Subtree Identification Method from Web Pages in Unfamiliar Web Sites

Authors:
Masanobu Tsuruta;Hiroyuki Sakai;Shigeru Masuyama
Affiliations:
-;-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2008

Citing 3
Cited 1

DOM-based content extraction of HTML documents

WWW '03 Proceedings of the 12th international conference on World Wide Web
Learning block importance models for web pages

Proceedings of the 13th international conference on World Wide Web
Automatic Identification of Informative Sections of Web Pages

IEEE Transactions on Knowledge and Data Engineering

Using web page layout for extraction of sender names

Proceedings of the 3rd International Universal Communication Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a method of informative DOM* subtree identification from a Web page in an unfamiliar Web site. Our method uses layout data of DOM nodes generated by a generic Web browser. The results show that our method outperforms a baseline method, and was able to identify informative DOM subtrees from Web pages robustly.