Query routing for Web search engines: architectures and experiments
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Automatic information extraction from web pages
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Automatic Information Discovery from the "Invisible Web"
ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
A two-phase sampling technique for information extraction from hidden web databases
Proceedings of the 6th annual ACM international workshop on Web information and data management
Foundations and Trends in Information Retrieval
Online social network profile data extraction for vulnerability analysis
International Journal of Internet Technology and Secured Transactions
Hi-index | 0.00 |
The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dynamically generated through querying databases - which are referred to as Hidden Web databases. Documents returned in response to a user query are typically presented using template-generated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision.