Metadata Based Web Mining for Topic-Specific Information Gathering

Authors:
Jeonghee Yi;Neel Sundaresan;Anital Huang
Affiliations:
-;-;-
Venue:
EC-WEB '00 Proceedings of the First International Conference on Electronic Commerce and Web Technologies
Year:
2000

Citing 10
Cited 0

ParaSite: mining structural information on the Web

Selected papers from the sixth international conference on World Wide Web
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A smart itsy bitsy spider for the web

Journal of the American Society for Information Science - Special topic issue: artificial intelligence techniques for emerging information systems applications
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Metadata Based Web Mining for Relevance

IDEAS '00 Proceedings of the 2000 International Symposium on Database Engineering & Applications
A machine learning approach to building domain-specific search engines

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the World-Wide-Web grows at an exponential rate, we are faced with the issue of rating pages in terms of quality and trust. In this siutation, with significant linkage among web pages, what other pages say about a web page can be as important as and more objective than what the page says about itself. The cumulative knowledge of such recommendations (or lack of them) can help a system to decide whether to pursue a page or not. This metadata information can also be used by a web robot program, for example, to derive summary information about web documents written in a foreign language. In this paper, we describe how we exploit this type of metadata to drive a web information gathering system, which forms the backend of a topic-specific search engine. The system uses metadata from hyperlinks to guide itself to crawl the web staying focused on a target topic. The crawler follows links that point to information related to the topic and avoids following links to irrelevant pages. Moreover, the system uses the metadata to improve its definition of the target topic through association mining. Ultimately, the guided crawling system builds a rich repository of metadata information, which is used to serve the search engine.