Geographically focused collaborative crawling

Authors:
Weizheng Gao;Hyun Chul Lee;Yingbo Miao
Affiliations:
Genieknows.com, Halifax, NS, Canada;University of Toronto, Toronto, ON, Canada;Genieknows.com, Halifax, NS, Canada
Venue:
Proceedings of the 15th international conference on World Wide Web
Year:
2006

Citing 19
Cited 14

Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
An adaptive model for optimizing performance of an incremental web crawler

Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Parallel crawlers

Proceedings of the 11th international conference on World Wide Web
Accelerated focused crawling through online relevance feedback

Proceedings of the 11th international conference on World Wide Web
Machine Learning

Machine Learning
Topic-oriented collaborative crawling

Proceedings of the eleventh international conference on Information and knowledge management
Mercator: A scalable, extensible Web crawler

World Wide Web
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Computing Geographical Scopes of Web Resources

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Machine Learning Approach to Building Domain-Specific Search Engines

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Crawling the web: discovery and maintenance of large-scale web data

Crawling the web: discovery and maintenance of large-scale web data
Categorizing web queries according to geographical locality

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Web-a-where: geotagging web content

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Geographical partition for distributed web crawling

Proceedings of the 2005 workshop on Geographic information retrieval

Geographic ranking for a local search engine

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Spatial variation in search engine queries

Proceedings of the 17th international conference on World Wide Web
Urban web crawling

Proceedings of the first international workshop on Location and the web
An ontology-based approach to learnable focused crawling

Information Sciences: an International Journal
Estimation of Geographic Relevance for Web Objects Using Probabilistic Models

W2GIS '08 Proceedings of the 8th International Symposium on Web and Wireless Geographical Information Systems
A cross-language focused crawling algorithm based on multiple relevance prediction strategies

Computers & Mathematics with Applications
Topical web crawling using weighted anchor text and web page change detection techniques

WSEAS Transactions on Information Science and Applications
Adaptive geospatially focused crawling

Proceedings of the 18th ACM conference on Information and knowledge management
Web information credibility analysis by geographical social support

Proceedings of the 3rd International Universal Communication Symposium
Web Crawling

Foundations and Trends in Information Retrieval
'Oh web image, where art thou?'

MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling
Hybrid indexing and seamless ranking of spatial and textual features of web documents

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Assigning documents to master sites in distributed search

Proceedings of the 20th ACM international conference on Information and knowledge management
SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search

Geoinformatica

Quantified Score

Hi-index	0.00

Visualization

Abstract

A collaborative crawler is a group of crawling nodes, in which each crawling node is responsible for a specific portion of the web. We study the problem of collecting geographi-cally-aware pages using collaborative crawling strategies. We first propose several collaborative crawling strategies for the geographically focused crawling, whose goal is to collect web pages about specified geographic locations, by considering features like URL address of page, content of page, extended anchor text of link, and others. Later, we propose various evaluation criteria to qualify the performance of such crawling strategies. Finally, we experimentally study our crawling strategies by crawling the real web data showing that some of our crawling strategies greatly outperform the simple URL-hash based partition collaborative crawling, in which the crawling assignments are determined according to the hash-value computation over URLs. More precisely, features like URL address of page and extended anchor text of link are shown to yield the best overall performance for the geographically focused crawling.