Effective Web data extraction with standard XML technologies
Proceedings of the 10th international conference on World Wide Web
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mercator: A scalable, extensible Web crawler
World Wide Web
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Hi-index | 0.00 |
Traditional search engines ignore the tremendous amount information "hidden" behind search forms of Web pages, in large searchable electronic databases, which is called hidden Web. In this paper, we address this problem of designing a system for extracting and retrieval hidden Web information. We present a generic operational model of the hidden Web information retrieval and describe the key techniques. We introduce a new Tag-Tree-based Object Extraction Technique for automatically extracting hidden Web information from web pages. Based on this technique, we implement the retrieval algorithm for structured query of hidden Web information. The test results have also been reported.