A New Study on Using HTML Structures to Improve Retrieval

  • Authors:
  • M. Cutler;H. Deng;S. S. Maniccam;W. Meng

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICTAI '99 Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Locating useful information effectively from the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks of HTML documents to improve the effectiveness of retrieving HTML documents. This methodology partitions the occurrences of terms in a document collection into classes according to the tags in which a particular term appears (such as Title, H1-H6, and Anchor). The rationale is that terms appearing in different structures of a document may have different significance in identifying the document. The weighting schemes of traditional information retrieval were extended to include class importance values. We implemented a genetic algorithm to determine a "best so far" class importance factor combination. Our experiments indicate that using this technique the retrieval effectiveness can be improved by 39.6% or higher.