DOM-based content extraction of HTML documents
WWW '03 Proceedings of the 12th international conference on World Wide Web
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Automatic Identification of Informative Sections of Web Pages
IEEE Transactions on Knowledge and Data Engineering
Using web page layout for extraction of sender names
Proceedings of the 3rd International Universal Communication Symposium
Hi-index | 0.00 |
We propose a method of informative DOM* subtree identification from a Web page in an unfamiliar Web site. Our method uses layout data of DOM nodes generated by a generic Web browser. The results show that our method outperforms a baseline method, and was able to identify informative DOM subtrees from Web pages robustly.