Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Organizing topic-specific web information
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
WTMS: a system for collecting for collecting and analyzing topic-specific Web information
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web
Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Improvement of HITS-based algorithms on web documents
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
Mercator: A scalable, extensible Web crawler
World Wide Web
MySpiders: Evolve Your Own Intelligent Web Crawlers
Autonomous Agents and Multi-Agent Systems
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
Ant Focused Crawling Algorithm
ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
A Topic-Specific Web Crawler with Concept Similarity Context Graph Based on FCA
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
SCTWC: An online semi-supervised clustering approach to topical web crawlers
Applied Soft Computing
The adaptive web
A constrained crawling approach and its application to a specialised search engine
International Journal of Information and Communication Technology
Improvement of HITS for topic-specific web crawler
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Algorithm for generating fuzzy rules for WWW document classification
ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Research on new algorithm of topic-oriented crawler and duplicated web pages detection
ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Semantic ranking of web pages based on formal concept analysis
Journal of Systems and Software
An approach for selecting seed URLs of focused crawler based on user-interest ontology
Applied Soft Computing
Hi-index | 0.00 |
Topic-specific web crawler collects relevant web pages of interested topics from the Internet. There are many previous researches focusing on algorithms of web page crawling. The main purpose of those algorithms is to gather as many relevant web pages as possible, and most of them only detail the approaches of the first crawling. However, no one has ever mentioned some important questions, such as how the crawler performs during the next crawling attempts, can the crawler learn from experience to crawl more relevant web pages in an incremental way, etc. In this paper, we present an algorithm that covers the discussion of both the first and the consecutive crawling. For efficient result of the next crawling, we derive the information of previous crawling attempts to build some knowledge bases: starting URLs, topic keywords and URL prediction. These knowledge bases are used to build the experience of the learnable topic-specific web crawler to produce better result for the next crawling. Preliminary evaluation illustrates that the proposed web crawler can learn from experience to better collect the web pages under interest during the early period of consecutive crawling attempts.