A TNATS approach to hidden web documents

  • Authors:
  • Yih-Ling Hedley;Muhammad Younas;Anne James

  • Affiliations:
  • School of Mathematical and Information Sciences, Coventry University, Coventry, UK;School of Mathematical and Information Sciences, Coventry University, Coventry, UK;School of Mathematical and Information Sciences, Coventry University, Coventry, UK

  • Venue:
  • ICDCIT'04 Proceedings of the First international conference on Distributed Computing and Internet Technology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hidden Web databases maintain a collection of documents, which are dynamically generated using Web page templates in response to user queries This paper presents a technique, Text with Neighbouring Adjacent Tag Segments (TNATS), to represent the contents of documents retrieved from an underlying database TNATS exploits tag structures that surround the textual content of a document This representation facilitates the process of detecting Web page templates and extraction of query-related information from documents We compare the performance of TNATS with existing techniques based on tag tree and text only representations Experimental results demonstrate that TNATS requires less processing time for information extraction than a tag tree representation It also produces optimum results in terms of detecting Web page templates and extracting query-related information.