The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Geospatial mapping and navigation of the web
Proceedings of the 10th international conference on World Wide Web
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Distributed Hypertext Resource Discovery Through Examples
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Focused Crawls, Tunneling, and Digital Libraries
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Extracting Spatial Knowledge from the Web
SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
Deriving link-context from HTML tag tree
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Web-a-where: geotagging web content
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
A General Evaluation Framework for Topical Crawlers
Information Retrieval
Crawling a country: better strategies than breadth-first for web page ordering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
Focused crawling for both topical relevance and quality of medical information
Proceedings of the 14th ACM international conference on Information and knowledge management
A large scale study of wireless search behavior: Google mobile search
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Geographically focused collaborative crawling
Proceedings of the 15th international conference on World Wide Web
An adaptive crawler for locating hidden-Web entry points
Proceedings of the 16th international conference on World Wide Web
Towards automatic extraction of event and place semantics from flickr tags
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Neighborhood restrictions in geographic IR
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the first international workshop on Location and the web
Retrieving address-based locations from the web
Proceedings of the 2nd international workshop on Geographic information retrieval
The adaptive web
DCbot: finding spatial information on the web
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Foundations and Trends in Information Retrieval
Focusing on novelty: a crawling strategy to build diverse language models
Proceedings of the 20th ACM international conference on Information and knowledge management
Relevance and ranking in geographic information retrieval
FDIA'11 Proceedings of the Fourth BCS-IRSG conference on Future Directions in Information Access
Sentiment-focused web crawling
Proceedings of the 21st ACM international conference on Information and knowledge management
A Visual Interactive System for Spatial Querying and Ranking of Geographic Regions
Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies
Criteria of query-independent page significance in geospatial web search
Proceedings of the 7th Workshop on Geographic Information Retrieval
Hi-index | 0.00 |
Location information on the Web is a precious asset for a multitude of applications and is becoming an increasingly important dimension in Web search. Even though more and more Web pages carry location information, they form only a small share of all pages and are scattered over the Web. To efficiently find and index location-related Web content, we propose an efficient crawling strategy that retrieves precisely those pages that are geospatially relevant while minimizing the amount of the non-spatially-relevant pages within the crawled pages. We propose to address this challenge by expanding the technique of focused crawling to exploit location references on Web pages to specifically retrieve geospatial topics on the Web. In this paper, we describe the design and development of a focused crawler with an adaptive geospatial focus that efficiently retrieves and identifies location-relevant documents on the Web. Drawing from geospatial features of both Web pages and the link graph, a crawl strategy based on Bayesian classifiers prioritizes promising links and pages, leading to a faster coverage of the desired geospatial topic as a means for fast creation of precise geospatial Web indexes. We present evaluations of the system's performance and share our findings on the geospatial Web graph and the distribution of location references on the Web.