A scalable comparison-shopping agent for the World-Wide Web
AGENTS '97 Proceedings of the first international conference on Autonomous agents
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
A layered architecture for querying dynamic Web content
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Information Systems - Special issue on semistructured data
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Building intelligent web applications using lightweight wrappers
Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
Bootstrapping for example-based data extraction
Proceedings of the tenth international conference on Information and knowledge management
Modern Information Retrieval
A brief survey of web data extraction tools
ACM SIGMOD Record
DEByE - Date extraction by example
Data & Knowledge Engineering
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
The Web-DL environment for building digital libraries from the Web
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A two-phase sampling technique for information extraction from hidden web databases
Proceedings of the 6th annual ACM international workshop on Web information and data management
Clustering web pages based on their structure
Data & Knowledge Engineering - Special issue: WIDM 2003
GoGetIt!: a tool for generating structure-driven web crawlers
Proceedings of the 15th international conference on World Wide Web
Structure-driven crawler generation by example
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Sampling, information extraction and summarisation of hidden web databases
Data & Knowledge Engineering - Special issue: WIDM 2004
iRobot: an intelligent crawler for web forums
Proceedings of the 17th international conference on World Wide Web
An Approach to Deep Web Crawling by Sampling
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
A Genre-Aware Approach to Focused Crawling
World Wide Web
The adaptive web
Using structured tokens to identify webpages for data extraction
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Selective recrawling for object-level vertical search
Proceedings of the 19th international conference on World wide web
Exploiting genre in focused crawling
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Understanding deep web search interfaces: a survey
ACM SIGMOD Record
On building a search interface discovery system
RED'09 Proceedings of the 2nd international conference on Resource discovery
Online social network profile data extraction for vulnerability analysis
International Journal of Internet Technology and Secured Transactions
A conceptual framework for efficient web crawling in virtual integration contexts
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
A tool for link-based web page classification
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes
ACM Transactions on the Web (TWEB)
iDetect: Content Based Monitoring of Complex Networks using Mobile Agents
Applied Soft Computing
FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Deep Web Information Retrieval Process: A Technical Survey
International Journal of Information Technology and Web Engineering
A pattern-based selective recrawling approach for object-level vertical search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hidden-Web induced by client-side scripting: an empirical study
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Selecting queries from sample to crawl deep web data sources
Web Intelligence and Agent Systems
Hi-index | 0.00 |
As the Web grows, more and more data has become available under dynamic forms of publication, such as legacy databases accessed by an HTML form (the so called hidden Web). In situations such as this, integration of this data relies more and more on the fast generation of agents that can automatically fetch pages for further processing. As a result, there is an increasing need for tools that can help users generate such agents. In this paper, we describe a method for automatically generating agents to collect hidden Web pages. This method uses a pre-existing data repository for identifying the contents of these pages and takes the advantage of some patterns that can be found among Web sites to identify the navigation paths to follow. To demonstrate the accuracy of our method, we discuss the results of a number of experiments carried out with sites from different domains.