Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Information retrieval in the World-Wide Web: making client-based searching feasible
Selected papers of the first conference on World-Wide Web
Evolving a multi-agent information filtering solution in Amalthaea
AGENTS '97 Proceedings of the first international conference on Autonomous agents
Inferring Web communities from link topology
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Adaptive information agents in distributed textual environments
AGENTS '98 Proceedings of the second international conference on Autonomous agents
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Towards a better understanding of Web resources and server responses for improved caching
WWW '99 Proceedings of the eighth international conference on World Wide Web
Measuring index quality using random walks on the Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Adding support for dynamic and focused search with Fetuccino
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web
Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
MySpiders: Evolve Your Own Intelligent Web Crawlers
Autonomous Agents and Multi-Agent Systems
Web Search Using a Genetic Algorithm
IEEE Internet Computing
A Topic-Specific Web Robot Model Based on Restless Bandits
IEEE Internet Computing
ARCCHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Machine Learning Approach to Building Domain-Specific Search Engines
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Complementing search engines with online web mining agents
Decision Support Systems - Special issue: Web data mining
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
A General Evaluation Framework for Topical Crawlers
Information Retrieval
Lexical and semantic clustering by web links
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Probabilistic models for focused web crawling
Proceedings of the 6th annual ACM international workshop on Web information and data management
Exploiting Interclass Rules for Focused Crawling
IEEE Intelligent Systems
Suggesting novel but related topics: towards context-based support for knowledge model extension
Proceedings of the 10th international conference on Intelligent user interfaces
Learnable topic-specific web crawler
Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
A General Evaluation Framework for Topical Crawlers
Information Retrieval
Adaptive query routing in peer web search
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Lexical and semantic clustering by web links
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
What's there and what's not?: focused crawling for missing documents in digital libraries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Mapping the Semantics of Web Text and Links
IEEE Internet Computing
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Interest-based personalized search
ACM Transactions on Information Systems (TOIS)
Using HMM to learn user browsing patterns for focused web crawling
Data & Knowledge Engineering - Special issue: WIDM 2004
Towards a query optimizer for text-centric tasks
ACM Transactions on Database Systems (TODS)
The impact of term selection in genre-aware focused crawling
Proceedings of the 2008 ACM symposium on Applied computing
Exploiting Multiple Features with MEMMs for Focused Web Crawling
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Using genetic algorithms to evolve a population of topical queries
Information Processing and Management: an International Journal
A cross-language focused crawling algorithm based on multiple relevance prediction strategies
Computers & Mathematics with Applications
A semi-supervised incremental algorithm to automatically formulate topical queries
Information Sciences: an International Journal
Advanced AI techniques for web mining
MAMECTIS'08 Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Profile-based focused crawling for social media-sharing websites
Journal on Image and Video Processing
Improving the performance of focused web crawlers
Data & Knowledge Engineering
A comparison of fraud cues and classification methods for fake escrow website detection
Information Technology and Management
A Genre-Aware Approach to Focused Crawling
World Wide Web
Exploiting Tags and Social Profiles to Improve Focused Crawling
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Adaptive geospatially focused crawling
Proceedings of the 18th ACM conference on Information and knowledge management
ExSearch: a novel vertical search engine for online barter business
Proceedings of the 18th ACM conference on Information and knowledge management
SCTWC: An online semi-supervised clustering approach to topical web crawlers
Applied Soft Computing
FICA: A novel intelligent crawling algorithm based on reinforcement learning
Web Intelligence and Agent Systems
Towards a graph-based user profile modeling for a session-based personalized search
Knowledge and Information Systems
Multi-objective Query Optimization Using Topic Ontologies
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Foundations and Trends in Information Retrieval
The adaptive web
Exploiting genre in focused crawling
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
News page discovery policy for instant crawlers
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
An effective relevance prediction algorithm based on hierarchical taxonomy for focused crawling
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Connectivity of the Thai web graph
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Agents, bookmarks and clicks: a topical model of web navigation
Proceedings of the 21st ACM conference on Hypertext and hypermedia
A Web page classification system based on a genetic algorithm using tagged-terms as features
Expert Systems with Applications: An International Journal
Architecture for a parallel focused crawler for clickstream analysis
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
A constrained crawling approach and its application to a specialised search engine
International Journal of Information and Communication Technology
Information Sciences: an International Journal
User browsing behavior-driven web crawling
Proceedings of the 20th ACM international conference on Information and knowledge management
A novel p2p information clustering and retrieval mechanism
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
LocalRank: ranking web pages considering geographical locality by integrating web and databases
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Automatic generation and use of negative terms to evaluate topic-related web pages
HSI'05 Proceedings of the 3rd international conference on Human Society@Internet: web and Communication Technologies and Internet-Related Social Issues
An incremental approach to link evaluation in topic-driven web resource discovery
AAIM'05 Proceedings of the First international conference on Algorithmic Applications in Management
Ontology based web crawling – a novel approach
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
ARCOMEM: from collect-all ARchives to COmmunity MEMories
Proceedings of the 21st international conference companion on World Wide Web
Looking for non-existent information: a consumer-led interactive search approach
BCS-HCI '11 Proceedings of the 25th BCS Conference on Human-Computer Interaction
PROBABILISTIC MODELS FOR FOCUSED WEB CRAWLING
Computational Intelligence
Research on new algorithm of topic-oriented crawler and duplicated web pages detection
ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers
ACM Transactions on Information Systems (TOIS)
Domain specific search in indian languages
Proceedings of the first workshop on Information and knowledge management for developing region
Exploiting the social and semantic web for guided web archiving
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
A novel shark-search algorithm for theme crawler
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
An analyst-adaptive approach to focused crawlers
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Topical crawling on the web through local site-searches
Journal of Web Engineering
An approach for selecting seed URLs of focused crawler based on user-interest ontology
Applied Soft Computing
Hi-index | 0.00 |
Topical crawlers are increasingly seen as a way to address the scalability limitations of universal search engines, by distributing the crawling process across users, queries, or even client computers. The context available to such crawlers can guide the navigation of links with the goal of efficiently locating highly relevant target pages. We developed a framework to fairly evaluate topical crawling algorithms under a number of performance metrics. Such a framework is employed here to evaluate different algorithms that have proven highly competitive among those proposed in the literature and in our own previous research. In particular we focus on the tradeoff between exploration and exploitation of the cues available to a crawler, and on adaptive crawlers that use machine learning techniques to guide their search. We find that the best performance is achieved by a novel combination of explorative and exploitative bias, and introduce an evolutionary crawler that surpasses the performance of the best nonadaptive crawler after sufficiently long crawls. We also analyze the computational complexity of the various crawlers and discuss how performance and complexity scale with available resources. Evolutionary crawlers achieve high efficiency and scalability by distributing the work across concurrent agents, resulting in the best performance/cost ratio.