Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 11th international conference on World Wide Web
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Machine Learning
Topic-oriented collaborative crawling
Proceedings of the eleventh international conference on Information and knowledge management
Mercator: A scalable, extensible Web crawler
World Wide Web
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Computing Geographical Scopes of Web Resources
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Machine Learning Approach to Building Domain-Specific Search Engines
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Crawling the web: discovery and maintenance of large-scale web data
Crawling the web: discovery and maintenance of large-scale web data
Categorizing web queries according to geographical locality
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Web-a-where: geotagging web content
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Geographical partition for distributed web crawling
Proceedings of the 2005 workshop on Geographic information retrieval
Geographic ranking for a local search engine
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Spatial variation in search engine queries
Proceedings of the 17th international conference on World Wide Web
Proceedings of the first international workshop on Location and the web
An ontology-based approach to learnable focused crawling
Information Sciences: an International Journal
Estimation of Geographic Relevance for Web Objects Using Probabilistic Models
W2GIS '08 Proceedings of the 8th International Symposium on Web and Wireless Geographical Information Systems
A cross-language focused crawling algorithm based on multiple relevance prediction strategies
Computers & Mathematics with Applications
Topical web crawling using weighted anchor text and web page change detection techniques
WSEAS Transactions on Information Science and Applications
Adaptive geospatially focused crawling
Proceedings of the 18th ACM conference on Information and knowledge management
Web information credibility analysis by geographical social support
Proceedings of the 3rd International Universal Communication Symposium
Foundations and Trends in Information Retrieval
'Oh web image, where art thou?'
MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling
Hybrid indexing and seamless ranking of spatial and textual features of web documents
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Assigning documents to master sites in distributed search
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
A collaborative crawler is a group of crawling nodes, in which each crawling node is responsible for a specific portion of the web. We study the problem of collecting geographi-cally-aware pages using collaborative crawling strategies. We first propose several collaborative crawling strategies for the geographically focused crawling, whose goal is to collect web pages about specified geographic locations, by considering features like URL address of page, content of page, extended anchor text of link, and others. Later, we propose various evaluation criteria to qualify the performance of such crawling strategies. Finally, we experimentally study our crawling strategies by crawling the real web data showing that some of our crawling strategies greatly outperform the simple URL-hash based partition collaborative crawling, in which the crawling assignments are determined according to the hash-value computation over URLs. More precisely, features like URL address of page and extended anchor text of link are shown to yield the best overall performance for the geographically focused crawling.