A scalable comparison-shopping agent for the World-Wide Web
AGENTS '97 Proceedings of the first international conference on Autonomous agents
Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
A layered architecture for querying dynamic Web content
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Bootstrapping for example-based data extraction
Proceedings of the tenth international conference on Information and knowledge management
DEByE - Date extraction by example
Data & Knowledge Engineering
The Debye Environment for Web Data Management
IEEE Internet Computing
Automating the Internet: Agents as User Surrogates
IEEE Internet Computing
Proceedings of the 27th International Conference on Very Large Data Bases
SmartCrawl: a new strategy for the exploration of the hidden web
Proceedings of the 6th annual ACM international workshop on Web information and data management
An automatic data grabber for large web sites
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Hi-index | 0.00 |
As the Web grows, more and more data has become available under dynamic forms of publication, such as a legacy database accessed by an HTML form (the so called Hidden Web). In situations such as this, integration of this data relies more and more on the fast generation of page fetching agents. As a result, there is an increasing need for tools that can help the user to generate such agents. In this paper, we describe an approach to automatically generating agents to collect hidden Web pages that uses a pre-existing data repository for identifying the contents of these pages and takes the advantage of some regularities that can be found among Web sites. To demonstrate the effectiveness of our approach, we discuss the results of a number of experiments carried out with sites from different domains. We also dicuss how such regularities among sites can be formalized.