Metadata Based Web Mining for Topic-Specific Information Gathering

  • Authors:
  • Jeonghee Yi;Neel Sundaresan;Anital Huang

  • Affiliations:
  • -;-;-

  • Venue:
  • EC-WEB '00 Proceedings of the First International Conference on Electronic Commerce and Web Technologies
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the World-Wide-Web grows at an exponential rate, we are faced with the issue of rating pages in terms of quality and trust. In this siutation, with significant linkage among web pages, what other pages say about a web page can be as important as and more objective than what the page says about itself. The cumulative knowledge of such recommendations (or lack of them) can help a system to decide whether to pursue a page or not. This metadata information can also be used by a web robot program, for example, to derive summary information about web documents written in a foreign language. In this paper, we describe how we exploit this type of metadata to drive a web information gathering system, which forms the backend of a topic-specific search engine. The system uses metadata from hyperlinks to guide itself to crawl the web staying focused on a target topic. The crawler follows links that point to information related to the topic and avoids following links to irrelevant pages. Moreover, the system uses the metadata to improve its definition of the target topic through association mining. Ultimately, the guided crawling system builds a rich repository of metadata information, which is used to serve the search engine.