On building a search interface discovery system

Authors:
Denis Shestakov
Affiliations:
Department of Media Technology, Aalto University, Espoo, Finland
Venue:
RED'09 Proceedings of the 2nd international conference on Resource discovery
Year:
2009

Citing 18
Cited 2

Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
QProber: A system for automatic classification of hidden-Web databases

ACM Transactions on Information Systems (TOIS)
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
A taxonomy of web search

ACM SIGIR Forum
Automated discovery of search interfaces on the web

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Crawling for Domain-Speci.c Hidden Web Resources

WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
Automatic generation of agents for collecting hidden web pages for data extraction

Data & Knowledge Engineering - Special issue: WIDM 2002
Organizing structured web sources by query schemas: a clustering approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Structured databases on the web: observations and implications

ACM SIGMOD Record
DEQUE: querying the deep web

Data & Knowledge Engineering
Query Selection Techniques for Efficient Crawling of Structured Web Sources

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Combining classifiers to identify online databases

Proceedings of the 16th international conference on World Wide Web
Google's Deep Web crawl

Proceedings of the VLDB Endowment
Automating the Design and Construction of Query Forms

IEEE Transactions on Knowledge and Data Engineering
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Host-IP Clustering Technique for Deep Web Characterization

APWEB '10 Proceedings of the 2010 12th International Asia-Pacific Web Conference
On estimating the scale of national deep web

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Host-IP clustering technique for deep web characterization

Proceedings of the 2010 ACM Symposium on Applied Computing
Current challenges in web crawling

ICWE'13 Proceedings of the 13th international conference on Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A huge portion of the Web known as the deep Web is accessible via search interfaces to myriads of databases on the Web. While relatively good approaches for querying the contents of web databases have been recently proposed, one cannot fully utilize them having most search interfaces unlocated. Thus, the automatic recognition of search interfaces to online databases is crucial for any application accessing the deep Web. This paper describes the architecture of the I-Crawler, a system for finding and classifying search interfaces. The I-Crawler is intentionally designed to be used in the deep web characterization surveys and for constructing directories of deep web resources.