Exploiting semantic web technologies to model web form interactions
Proceedings of the 17th international conference on World Wide Web
Scalable web data extraction for online market intelligence
Proceedings of the VLDB Endowment
Chapter 6: web data extraction for service creation
Search Computing
Datalog-Related aspects in lixto visual developer
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Hi-index | 0.00 |
In literature, data extraction techniques for HTML and semi-structured data in general have been exhaustively studied and a number of automatic and semi-automatic approaches proposed. Howeover, in real-life scenarios data extraction capabilities are only one half of the game. Password-protected sites, cookies, non-HTML data formats, JavaScript, Session IDs, Web Form iterations and dynamic changes onWeb sites are the obstacles that makeWeb data extraction difficult in real-life application scenarios. We propose, based on current Lixto technology, a novel approach that introduces action-based Web navigation sequence recording and replaying and its close integration with extraction technologies. On the one hand, the technical innovation is the embedding of the Mozilla browser into the Lixto Visual Wrapper with the advantage of the support of a large number of Web standards and an open-source API to permit close interaction of Lixto with Mozilla. On the other hand, we develop a navigation language and explore its close interaction with Elog, the extraction language of Lixto. Current research status and sample screenshots are given. The paper closes with a description of two application domains where Deep Web navigation capabilities play a crucial role, that is automotive B2B Web platforms and Business Intelligence scenarios.