Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
Regression testing for wrapper maintenance
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A Case-Based Recognition of Semantic Structures in HTML Documents
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Hi-index | 0.00 |
Recently, a huge quantity of HTML documents have been created in Internet, which really constitute a treasury of information. HTML, however, is designed mainly for reading with browsers, and not suitable for machine processing, whereas XML was proposed as a solution for this problem. In this paper, we give a case-based transformation method from HTML documents to XML ones. There are many series of HTML pages in actual Web sites, and each page of a series usually has a quite similar structure with each other. Therefore a case-based transformation must be a promising method in practice for a semi-automatic transformation from HTML to XML. Throughout experimental evaluations, we show this case-based method achieved a highly accurate transformation, i.e., 85% of actual 80 pages can be transformed in a correct way, with this case-based method.