Query routing for Web search engines: architectures and experiments
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Automatic information extraction from web pages
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Automatic Information Discovery from the "Invisible Web"
ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Automatic detection of fragments in dynamically generated web pages
Proceedings of the 13th international conference on World Wide Web
A two-phase sampling technique for information extraction from hidden web databases
Proceedings of the 6th annual ACM international workshop on Web information and data management
Sampling, information extraction and summarisation of hidden web databases
Data & Knowledge Engineering - Special issue: WIDM 2004
Hi-index | 0.00 |
Hidden Web databases maintain a collection of documents, which are dynamically generated using Web page templates in response to user queries This paper presents a technique, Text with Neighbouring Adjacent Tag Segments (TNATS), to represent the contents of documents retrieved from an underlying database TNATS exploits tag structures that surround the textual content of a document This representation facilitates the process of detecting Web page templates and extraction of query-related information from documents We compare the performance of TNATS with existing techniques based on tag tree and text only representations Experimental results demonstrate that TNATS requires less processing time for information extraction than a tag tree representation It also produces optimum results in terms of detecting Web page templates and extracting query-related information.