Capturing Semantics in HTML Documents

Authors:
Mengchi Liu
Affiliations:
-
Venue:
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Year:
2002

Citing 8
Cited 0

A query language for a Web-site management system

ACM SIGMOD Record
Database techniques for the World-Wide Web: a survey

ACM SIGMOD Record
Modeling Web sources for information integration

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Database--Principles, Programming and Performance

Database--Principles, Programming and Performance
A Conceptual Model and Rule-Based Query Language for HTML

World Wide Web
A Data Model for Semistructured Data with Partial and Inconsistent Information

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Towards semistructured data integration

Web-enabled systems integration
Semantic Metadata for the Integration of Web-based Data for Electronic Commerce

WECWIS '99 Proceedings of the International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most documents available over the web confirm to the HTML specification. They are intended to be human readable through a web browser and thus are constructed following some common conventions. Based on such common conventions, the Conceptual Model for HTML was proposed recently to automatically capture the hierarchical structure within web documents. However, certain key semantic information about the contents in the documents, which are obvious to human, are often omitted. As a result, web data processing, manipulation and integration are still quite difficult. In this paper, we discuss how to extend the Conceptual Model for HTML to capture the intended semantics of the HTML documents. We show that with the new constructs introduced, using an Intelligent Wrapper, and limited human interaction, semantics can be transferred from human into the Extended Conceptual Model so that further meaningful processing, manipulation and integration of web documents become possible.