The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SPHINX: a framework for creating personal, site-specific Web crawlers
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Controlling the robots of Web search engines
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Salticus: guided crawling for personal digital libraries
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Efficient web searching using temporal factors
Theoretical Computer Science
ACM Transactions on Internet Technology (TOIT)
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
ACM Transactions on Internet Technology (TOIT)
Discovery of Web Robot Sessions Based on their Navigational Patterns
Data Mining and Knowledge Discovery
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Mercator: A scalable, extensible Web crawler
World Wide Web
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Design and Implementation of a Distributed Crawler and Filtering Processor
NGITS '02 Proceedings of the 5th International Workshop on Next Generation Information Technologies and Systems
Web Structure, Dynamics and Page Quality
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Adaptive on-line page importance computation
WWW '03 Proceedings of the 12th international conference on World Wide Web
CoBWeb A Crawler for the Brazilian Web
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 13th international conference on World Wide Web
Performance and cost tradeoffs in Web search
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
High performance crawling system
Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Scheduling Algorithms for Web Crawling
LA-WEBMEDIA '04 Proceedings of the WebMedia & LA-Web 2004 Joint Conference 10th Brazilian Symposium on Multimedia and the Web 2nd Latin American Web Congress
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
Evaluation of crawling policies for a web-repository crawler
Proceedings of the seventeenth conference on Hypertext and hypermedia
Estimating the global pagerank of web communities
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Architecture of a grid-enabled Web search engine
Information Processing and Management: an International Journal
On rank correlation in information retrieval evaluation
ACM SIGIR Forum
RankMass crawler: a crawler with high personalized pagerank coverage guarantee
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
iRobot: an intelligent crawler for web forums
Proceedings of the 17th international conference on World Wide Web
Exploring traversal strategy for web forum crawling
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Ant Focused Crawling Algorithm
ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
High-performance priority queues for parallel crawlers
Proceedings of the 10th ACM workshop on Web information and data management
On the feasibility of geographically distributed web crawling
Proceedings of the 3rd international conference on Scalable information systems
Sitemaps: above and beyond the crawl of duty
Proceedings of the 18th international conference on World wide web
Incorporating site-level knowledge for incremental crawling of web forums: a list-wise strategy
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Quantifying performance and quality gains in distributed web search engines
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The impact of crawl policy on web search effectiveness
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Adaptive geospatially focused crawling
Proceedings of the 18th ACM conference on Information and knowledge management
Weighted Rank Correlation in Information Retrieval Evaluation
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Foundations and Trends in Information Retrieval
The importance of anchor text for ad hoc search revisited
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Where to crawl next for focused crawlers
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
User browsing behavior-driven web crawling
Proceedings of the 20th ACM international conference on Information and knowledge management
Effectiveness beyond the first crawl tier
Proceedings of the 20th ACM international conference on Information and knowledge management
Algorithmic challenges in web search engines
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Searching moving objects in a spatio-temporal distributed database servers system
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
Algorithmic challenges in web search engines
LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
ARCOMEM: from collect-all ARchives to COmmunity MEMories
Proceedings of the 21st international conference companion on World Wide Web
A fast algorithm to find all high degree vertices in power law graphs
Proceedings of the 21st international conference companion on World Wide Web
A fast algorithm to find all high degree vertices in graphs with a power law degree sequence
WAW'12 Proceedings of the 9th international conference on Algorithms and Models for the Web Graph
Exploiting the social and semantic web for guided web archiving
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Archival HTTP redirection retrieval policies
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
This article compares several page ordering strategies for Web crawling under several metrics. The objective of these strategies is to download the most "important" pages "early" during the crawl. As the coverage of modern search engines is small compared to the size of the Web, and it is impossible to index all of the Web for both theoretical and practical reasons, it is relevant to index at least the most important pages.We use data from actual Web pages to build Web graphs and execute a crawler simulator on those graphs. As the Web is very dynamic, crawling simulation is the only way to ensure that all the strategies considered are compared under the same conditions. We propose several page ordering strategies that are more efficient than breadth- first search and strategies based on partial Pagerank calculations.