Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The Araneus Web-based management system
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Information gathering in the World-Wide Web: the W3QL query language and the W3QS system
ACM Transactions on Database Systems (TODS)
A layered architecture for querying dynamic Web content
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
A Machine Learning Approach to Building Domain-Specific Search Engines
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Data & Knowledge Engineering
Extracting lists of data records from semi-structured web pages
Data & Knowledge Engineering
Finding and Extracting Data Records from Web Pages
Journal of Signal Processing Systems
Detecting semantically correct changes to relevant unordered hidden web data
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
In this paper, we address the problem of extracting and transforming dynamically generated hyperlinked hidden web query results to XML. Our approach is based on the STALKER approach. As STALKER was designed to extract data from a single web page, it cannot handle a set of hyperlinked pages. We propose an algorithm called HW-Transform for transforming hidden web query results (also called QURE-Pagelets) to XML format using machine learning by extending STALKER to handle hyperlinked hidden web pages. One of the key features of our approach is that we identify and transform key attributes of query results into XML attributes. These key attributes facilitate applications such as change detection and data integration by efficiently identifying related or identical results. Based on the proposed algorithm, we have implemented a prototype system called HW-STALKER using Java. Our experiments demonstrate that HW-Transform shows acceptable performance for transforming QURE-Pagelets to XML.