Introduction to algorithms
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
WTMS: a system for collecting for collecting and analyzing topic-specific Web information
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Mining the Web: Discovering Knowledge from HyperText Data
Mining the Web: Discovering Knowledge from HyperText Data
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Topical web crawlers: Evaluating adaptive algorithms
ACM Transactions on Internet Technology (TOIT)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Database research at Bilkent University
ACM SIGMOD Record
Architecture of a grid-enabled Web search engine
Information Processing and Management: an International Journal
An Ontology-Based Focused Crawler
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Design of CORE: context ontology rule enhanced focused web crawler
Proceedings of the International Conference on Advances in Computing, Communication and Control
An effective relevance prediction algorithm based on hierarchical taxonomy for focused crawling
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Sentiment-focused web crawling
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
A focused crawler is an agent that concentrates on a particular target topic and tries to visit and gather only relevant pages from the Web. A crucial issue for a focused crawler is the underlying heuristic for deciding the page to visit next. The authors propose a rule-based approach to improve a baseline focused crawler's harvest rate and coverage. The baseline focused crawler employs a canonical topic taxonomy to train a naïve-Bayesian classifier, which then helps score unseen URLs. The authors explore using simple rules derived from interclass (topic) linkage patterns to decide the crawler's next move. The rule-based approach also enhances the baseline crawler in supporting tunneling. In initial performance results, the rule-based crawler improved the harvest rate and coverage of the baseline crawler.