Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Proceedings of the 27th International Conference on Very Large Data Bases
ACM SIGIR Forum
Automated discovery of search interfaces on the web
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Crawling for Domain-Speci.c Hidden Web Resources
WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
Automatic generation of agents for collecting hidden web pages for data extraction
Data & Knowledge Engineering - Special issue: WIDM 2002
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Structured databases on the web: observations and implications
ACM SIGMOD Record
Data & Knowledge Engineering
Query Selection Techniques for Efficient Crawling of Structured Web Sources
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
Proceedings of the VLDB Endowment
Automating the Design and Construction of Query Forms
IEEE Transactions on Knowledge and Data Engineering
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Host-IP Clustering Technique for Deep Web Characterization
APWEB '10 Proceedings of the 2010 12th International Asia-Pacific Web Conference
On estimating the scale of national deep web
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Host-IP clustering technique for deep web characterization
Proceedings of the 2010 ACM Symposium on Applied Computing
Current challenges in web crawling
ICWE'13 Proceedings of the 13th international conference on Web Engineering
Hi-index | 0.00 |
A huge portion of the Web known as the deep Web is accessible via search interfaces to myriads of databases on the Web. While relatively good approaches for querying the contents of web databases have been recently proposed, one cannot fully utilize them having most search interfaces unlocated. Thus, the automatic recognition of search interfaces to online databases is crucial for any application accessing the deep Web. This paper describes the architecture of the I-Crawler, a system for finding and classifying search interfaces. The I-Crawler is intentionally designed to be used in the deep web characterization surveys and for constructing directories of deep web resources.