Ontology-based HTML to XML conversion

Authors:
Shijun Li;Weijie Ou;Junqing Yu
Affiliations:
School of Computer, Wuhan University, Wuhan, China;School of Computer, Wuhan University, Wuhan, China;College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China
Venue:
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Year:
2005

Citing 8
Cited 0

Building intelligent web applications using lightweight wrappers

Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
Data extraction from the web based on pre-defined schema

Journal of Computer Science and Technology
Automatically Extracting Ontologically Specified Data from HTML Tables of Unknown Structure

ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
An Ontology-Based HTML to XML Conversion Using Intelligent Agents

HICSS '02 Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 4 - Volume 4
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
RPE query processing and optimization techniques for XML databases

Journal of Computer Science and Technology
An interactive clustering-based approach to integrating source query interfaces on the deep Web

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Using Object Deputy Model to Prepare Data for Data Warehousing

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current wrapper approaches break down in extracting data from differently structured and frequently changing Web pages. To tackle this challenge, this paper defines domain-specific ontology, captures the semantic hierarchy in Web pages automatically by exploiting both structural information and common formatting information, and recognizes and extracts data by using ontology-based semantic matching without relying on page-specific formatting. It is adaptive to differently structured and frequently changing Web pages for a domain of interest.