ParaSite: mining structural information on the Web
Selected papers from the sixth international conference on World Wide Web
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
A smart itsy bitsy spider for the web
Journal of the American Society for Information Science - Special topic issue: artificial intelligence techniques for emerging information systems applications
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Metadata Based Web Mining for Relevance
IDEAS '00 Proceedings of the 2000 International Symposium on Database Engineering & Applications
A machine learning approach to building domain-specific search engines
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
As the World-Wide-Web grows at an exponential rate, we are faced with the issue of rating pages in terms of quality and trust. In this siutation, with significant linkage among web pages, what other pages say about a web page can be as important as and more objective than what the page says about itself. The cumulative knowledge of such recommendations (or lack of them) can help a system to decide whether to pursue a page or not. This metadata information can also be used by a web robot program, for example, to derive summary information about web documents written in a foreign language. In this paper, we describe how we exploit this type of metadata to drive a web information gathering system, which forms the backend of a topic-specific search engine. The system uses metadata from hyperlinks to guide itself to crawl the web staying focused on a target topic. The crawler follows links that point to information related to the topic and avoids following links to irrelevant pages. Moreover, the system uses the metadata to improve its definition of the target topic through association mining. Ultimately, the guided crawling system builds a rich repository of metadata information, which is used to serve the search engine.