Collaborative crawling: mining user experiences for topical resource discovery

Authors:
Charu C. Aggarwal
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 3
Cited 7

Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
Collaborative crawling: mining user experiences for topical resource discovery

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Collaborative crawling: mining user experiences for topical resource discovery

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On Leveraging User Access Patterns for Topic Specific Crawling

Data Mining and Knowledge Discovery
Link Contexts in Classifier-Guided Topical Crawlers

IEEE Transactions on Knowledge and Data Engineering
Incremental mining of information interest for personalized web scanning

Information Systems
Incremental mining of information interest for personalized web scanning

Information Systems
Semantic Web Mining

Web Semantics: Science, Services and Agents on the World Wide Web
No Code Required: Giving Users Tools to Transform the Web

No Code Required: Giving Users Tools to Transform the Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid growth of the world wide web had made the problem of topic specific resource discovery an important one in recent years. In this problem, it is desired to find web pages which satisfy a predicate specified by the user. Such a predicate could be a keyword query, a topical query, or some arbitrary contraint. Several techniques such as focussed crawling and intelligent crawling have recently been proposed for topic specific resource discovery. All these crawlers are linkage based, since they use the hyperlink behavior in order to perform resource discovery. Recent studies have shown that the topical correlations in hyperlinks are quite noisy and may not always show the consistency necessary for a reliable resource discovery process. In this paper, we will approach the problem of resource discovery from an entirely different perspective; we will mine the significant browsing patterns of world wide web users in order to model the likelihood of web pages belonging to a specified predicate. This user behavior can be mined from the freely available traces of large public domain proxies on the world wide web. We refer to this technique as collaborative crawling because it mines the collective user experiences in order to find topical resources. Such a strategy is extremely effective because the topical consistency in world wide web browsing patterns turns out to very reliable. In addition, the user-centered crawling system can be combined with linkage based systems to create an overall system which works more effectively than a system based purely on either user behavior or hyperlinks.