QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Semi-Automatic Wrapper Generation for Commercial Web Sources
Proceedings of the IFIP TC8 / WG8.1 Working Conference on Engineering Information Systems in the Internet Context
Crawling for Domain-Speci.c Hidden Web Resources
WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
Automatic integration of Web search interfaces with WISE-Integrator
The VLDB Journal — The International Journal on Very Large Data Bases
Structured databases on the web: observations and implications
ACM SIGMOD Record
Downloading textual hidden web content through keyword queries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Light-weight domain-based form assistant: querying web databases on the fly
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Crawling web pages with support for client-side dynamism
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Extracting lists of data records from semi-structured web pages
Data & Knowledge Engineering
From queries to search forms: an implementation
International Journal of Computer Applications in Technology
Turbo-charging hidden database samplers with overflowing queries and skew reduction
Proceedings of the 13th International Conference on Extending Database Technology
Finding and Extracting Data Records from Web Pages
Journal of Signal Processing Systems
Unbiased estimation of size and other aggregates over hidden web databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A conceptual framework for efficient web crawling in virtual integration contexts
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Optimal algorithms for crawling a hidden database in the web
Proceedings of the VLDB Endowment
Architecture specification of rule-based deep web crawler with indexer
International Journal of Knowledge and Web Intelligence
Selecting queries from sample to crawl deep web data sources
Web Intelligence and Agent Systems
Hi-index | 0.00 |
The crawler engines of today cannot reach most of the information contained in the Web. A great amount of valuable information is "hidden" behind the query forms of online databases, and/or is dynamically generated by technologies such as JavaScript. This portion of the web is usually known as the Deep Web or the Hidden Web. We have built DeepBot, a prototype hidden-web crawler able to access such content. DeepBot receives as input a set of domain definitions, each one describing a specific data-collecting task and automatically identifies and learns to execute queries on the forms relevant to them. In this paper we describe the techniques employed for building DeepBot and report the experimental results obtained when testing it with several real world data collection tasks.