Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Collaborative crawling: mining user experiences for topical resource discovery
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Collaborative crawling: mining user experiences for topical resource discovery
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On Leveraging User Access Patterns for Topic Specific Crawling
Data Mining and Knowledge Discovery
Link Contexts in Classifier-Guided Topical Crawlers
IEEE Transactions on Knowledge and Data Engineering
Incremental mining of information interest for personalized web scanning
Information Systems
Incremental mining of information interest for personalized web scanning
Information Systems
Web Semantics: Science, Services and Agents on the World Wide Web
No Code Required: Giving Users Tools to Transform the Web
No Code Required: Giving Users Tools to Transform the Web
Hi-index | 0.00 |
The rapid growth of the world wide web had made the problem of topic specific resource discovery an important one in recent years. In this problem, it is desired to find web pages which satisfy a predicate specified by the user. Such a predicate could be a keyword query, a topical query, or some arbitrary contraint. Several techniques such as focussed crawling and intelligent crawling have recently been proposed for topic specific resource discovery. All these crawlers are linkage based, since they use the hyperlink behavior in order to perform resource discovery. Recent studies have shown that the topical correlations in hyperlinks are quite noisy and may not always show the consistency necessary for a reliable resource discovery process. In this paper, we will approach the problem of resource discovery from an entirely different perspective; we will mine the significant browsing patterns of world wide web users in order to model the likelihood of web pages belonging to a specified predicate. This user behavior can be mined from the freely available traces of large public domain proxies on the world wide web. We refer to this technique as collaborative crawling because it mines the collective user experiences in order to find topical resources. Such a strategy is extremely effective because the topical consistency in world wide web browsing patterns turns out to very reliable. In addition, the user-centered crawling system can be combined with linkage based systems to create an overall system which works more effectively than a system based purely on either user behavior or hyperlinks.