MySpiders: Evolve Your Own Intelligent Web Crawlers

Authors:
Gautam Pant;Filippo Menczer
Affiliations:
Department of Management Sciences, The University of Iowa, Iowa City, IA 52242 gautam-pant@uiowa.edu;Department of Management Sciences, The University of Iowa, Iowa City, IA 52242 filippo-menczer@uiowa.edu
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
2002

Citing 10
Cited 12

Information retrieval in the World-Wide Web: making client-based searching feasible

Selected papers of the first conference on World-Wide Web
Autonomous interface agents

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
An adaptive Web page recommendation service

AGENTS '97 Proceedings of the first international conference on Autonomous agents
Predicting the performance of linearly combined IR systems

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Inside Java 2 platform security architecture, API design, and implementation

Inside Java 2 platform security architecture, API design, and implementation
Adding support for dynamic and focused search with Fetuccino

WWW '99 Proceedings of the eighth international conference on World Wide Web
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web

Machine Learning - Special issue on information retrieval
Evaluating topic-driven web crawlers

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Context and Page Analysis for Improved Web Search

IEEE Internet Computing
Syskill & webert: Identifying interesting web sites

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Agents, Crawlers, and Web Retrieval

CIA '02 Proceedings of the 6th International Workshop on Cooperative Information Agents VI
Topical web crawlers: Evaluating adaptive algorithms

ACM Transactions on Internet Technology (TOIT)
Probabilistic models for focused web crawling

Proceedings of the 6th annual ACM international workshop on Web information and data management
Learnable topic-specific web crawler

Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
A General Evaluation Framework for Topical Crawlers

Information Retrieval
Lexical and semantic clustering by web links

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Learning to crawl: Comparing classification schemes

ACM Transactions on Information Systems (TOIS)
Link Contexts in Classifier-Guided Topical Crawlers

IEEE Transactions on Knowledge and Data Engineering
Focused crawling guided by link context

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Automated gathering of Web information: An in-depth examination of agents interacting with search engines

ACM Transactions on Internet Technology (TOIT)
Automatic generation and use of negative terms to evaluate topic-related web pages

HSI'05 Proceedings of the 3rd international conference on Human Society@Internet: web and Communication Technologies and Internet-Related Social Issues
Agent and multi-agent applications to support distributed communities of practice: a short review

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The dynamic nature of the World Wide Web makes it a challenge to find information that is both relevant and recent. Intelligent agents can complement the power of search engines to meet this challenge. We present a Web tool called MySpiders, which implements an evolutionary algorithm managing a population of adaptive crawlers who browse the Web autonomously. Each agent acts as an intelligent client on behalf of the user, driven by a user query and by textual and linkage clues in the crawled pages. Agents autonomously decide which links to follow, which clues to internalize, when to spawn offspring to focus the search near a relevant source, and when to starve. The tool is available to the public as a threaded Java applet. We discuss the development and deployment of such a system.