Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Modeling Web sources for information integration
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
ACM SIGMOD Record
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Hi-index | 0.00 |
With the development of the Internet, the Web has become invaluable information source. In order to use this information for more than human browsing, web pages in HTML must be converted into a format meaningful to software programs. Wrappers have been a useful technique to convert HTML documents into semantically meaningful XML files. In this paper, we propose a data extraction approach based on extracting schema, which generates automatically a wrapper to extract data from an HTML document, and produces an XML document conforming to given DTD. After the user defines extraction data schema in the form of DTD, the wrapper is generated automatically with the induction and leaning algorithm. The experiment indicates that the approach can correctly extract the required data from the source document with high accuracy.