A Case-Based Transformation from HTML to XML

Authors:
Masayuki Umehara;Koji Iwanuma
Affiliations:
-;-
Venue:
IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
Year:
2000

Citing 3
Cited 1

Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Regression testing for wrapper maintenance

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval

A Case-Based Recognition of Semantic Structures in HTML Documents

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, a huge quantity of HTML documents have been created in Internet, which really constitute a treasury of information. HTML, however, is designed mainly for reading with browsers, and not suitable for machine processing, whereas XML was proposed as a solution for this problem. In this paper, we give a case-based transformation method from HTML documents to XML ones. There are many series of HTML pages in actual Web sites, and each page of a series usually has a quite similar structure with each other. Therefore a case-based transformation must be a promising method in practice for a semi-automatic transformation from HTML to XML. Throughout experimental evaluations, we show this case-based method achieved a highly accurate transformation, i.e., 85% of actual 80 pages can be transformed in a correct way, with this case-based method.