Metadata Based Web Mining for Relevance

  • Authors:
  • Jeongshee Yi;Neel Sundaresan

  • Affiliations:
  • -;-

  • Venue:
  • IDEAS '00 Proceedings of the 2000 International Symposium on Database Engineering & Applications
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a relevant term discoverer, a system that discovers relevant topics of a given topic from the World Wide Web. The system mines hyperlink metadata on the basis of the association of terms in the metadata. It also applies various filtering techniques to detect false positives and false negatives. The applications of the system include: i) topic-specific information gathering systems that need to crawl resources of the relevant topic, ii) bibliography search system that need to extend their search to the articles of relevant topics, iii) classification systems that can categorize items of similar class together, and so on. We report a successful application of the system to build a topic-specific search-engine dedicated to eXtensible Markup Language (XML). Using the algorithms presented in this paper, we were able to identify the relevant topics that the search engine needs to cover. Together with effective topic-directed crawling algorithms, we were able to build a topic-specific search engine that require significantly less human labor but perform almost as well as topic-specific search engines whose content is maintained by humans.