Machine Learning
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Detecting web page structure for adaptive viewing on small form factor devices
WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic Discovery of Semantic Structures in HTML Documents
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Datarover: a taxonomy based crawler for automated data extraction from data-intensive websites
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Identifying Semantic Constructs in Web Documents to Improve Web Site Accessibility
WISE '08 Proceedings of the 2008 international workshops on Web Information Systems Engineering
Hi-index | 0.00 |
This paper proposes a learning approach for discovering the semantic structure of web pages. The task includes partitioning the text on a web page into information blocks and identifying their semantic categories. We employed two machine learning techniques, Adaboost and SVMs, to learn from a labeled web page corpus. We evaluated our approach on general web pages from the World Wide Web and obtained encouraging results. This work can be beneficial to a number of web-driven applications such as search engines, web-based question answering, web-based data mining as well as voice enabled web navigation.