A TNATS approach to hidden web documents

Authors:
Yih-Ling Hedley;Muhammad Younas;Anne James
Affiliations:
School of Mathematical and Information Sciences, Coventry University, Coventry, UK;School of Mathematical and Information Sciences, Coventry University, Coventry, UK;School of Mathematical and Information Sciences, Coventry University, Coventry, UK
Venue:
ICDCIT'04 Proceedings of the First international conference on Distributed Computing and Internet Technology
Year:
2004

Citing 8
Cited 1

Query routing for Web search engines: architectures and experiments

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Automatic information extraction from web pages

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
QProber: A system for automatic classification of hidden-Web databases

ACM Transactions on Information Systems (TOIS)
Automatic Information Discovery from the "Invisible Web"

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Automatic detection of fragments in dynamically generated web pages

Proceedings of the 13th international conference on World Wide Web
A two-phase sampling technique for information extraction from hidden web databases

Proceedings of the 6th annual ACM international workshop on Web information and data management

Sampling, information extraction and summarisation of hidden web databases

Data & Knowledge Engineering - Special issue: WIDM 2004

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hidden Web databases maintain a collection of documents, which are dynamically generated using Web page templates in response to user queries This paper presents a technique, Text with Neighbouring Adjacent Tag Segments (TNATS), to represent the contents of documents retrieved from an underlying database TNATS exploits tag structures that surround the textual content of a document This representation facilitates the process of detecting Web page templates and extraction of query-related information from documents We compare the performance of TNATS with existing techniques based on tag tree and text only representations Experimental results demonstrate that TNATS requires less processing time for information extraction than a tag tree representation It also produces optimum results in terms of detecting Web page templates and extracting query-related information.