Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Webpage understanding: an integrated approach
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Webpage understanding: beyond page-level search
ACM SIGMOD Record
Hi-index | 0.00 |
Little work has been done towards an integrated statistical model for understanding webpage structures and processing natural language sentences within the HTML elements. This paper proposed a novel framework called WebNLP which enables bidirectional integration of page structure understanding and text understanding in an iterative manner. Experiments show that the WebNLP framework achieved significantly better performance.