Automation of the deep web with user defined behaviours

Authors:
Vicente Luque Centeno;Carlos Delgado Kloos;Peter T. Breuer;Luis Sánchez Fernández;Ma. Eugenia Gonzalo Cabellos;Juan Antonio Herráiz Pérez
Affiliations:
Departamento de Ingeniería Telemática, Universidad Carlos III de Madrid, Madrid, Spain;Departamento de Ingeniería Telemática, Universidad Carlos III de Madrid, Madrid, Spain;Departamento de Ingeniería Telemática, Universidad Carlos III de Madrid, Madrid, Spain;Departamento de Ingeniería Telemática, Universidad Carlos III de Madrid, Madrid, Spain;Departamento de Ingeniería Telemática, Universidad Carlos III de Madrid, Madrid, Spain;Departamento de Ingeniería Telemática, Universidad Carlos III de Madrid, Madrid, Spain
Venue:
AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Year:
2003

Citing 9
Cited 0

WebL - a programming language for the Web

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Clean up your Web pages with HP's HTML tidy

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Service Combinators for Web Computing

IEEE Transactions on Software Engineering
Effective Web data extraction with standard XML technologies

Proceedings of the 10th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
Hierarchical Wrapper Induction for Semistructured Information Sources

Autonomous Agents and Multi-Agent Systems
Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Giving semantics to Web data is an issue for automated Web navigation. Since legacy Web pages have been built using HTML as a visualization-oriented markup for years, data on the Web is suitable for people using browsers, but not for programs automatically performing a task on the Web on behalf of their users. The W3C Semantic Web initiative [16] tries to solve this by explicitly declaring semantic descriptions in (typically RDF [19] and OWL [23]) metadata associated to Web pages and ontologies combined with semantic rules. This way, inference-enabled agents may deduce which actions (links to be followed, forms to be filled,...) should be executed in order to retrieve the results for a user's query. However, something more than inferring how to retrieve information from the Web is needed to automate tasks on the Web. Information retrieval [3] is only the first step. Other actions like relevant data extraction, data homogeneization and user definable processing are needed as well for automating Web-enabled applications running on Web servers. This paper proposes two programming languages for instructing assistants about how to explore Web sites according to the user's aims, providing a real example from the legacy deep Web.