Exploiting Interclass Rules for Focused Crawling

Authors:
Ismail Sengor Altingovde;Ozgur Ulusoy
Affiliations:
Bilkent University;Bilkent University
Venue:
IEEE Intelligent Systems
Year:
2004

Citing 12
Cited 6

Introduction to algorithms

Introduction to algorithms
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
WTMS: a system for collecting for collecting and analyzing topic-specific Web information

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
Accelerated focused crawling through online relevance feedback

Proceedings of the 11th international conference on World Wide Web
The structure of broad topics on the web

Proceedings of the 11th international conference on World Wide Web
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Topical web crawlers: Evaluating adaptive algorithms

ACM Transactions on Internet Technology (TOIT)
From focused crawling to expert information: an application framework for web exploration and portal generation

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Database research at Bilkent University

ACM SIGMOD Record
Architecture of a grid-enabled Web search engine

Information Processing and Management: an International Journal
An Ontology-Based Focused Crawler

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Design of CORE: context ontology rule enhanced focused web crawler

Proceedings of the International Conference on Advances in Computing, Communication and Control
An effective relevance prediction algorithm based on hierarchical taxonomy for focused crawling

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Sentiment-focused web crawling

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A focused crawler is an agent that concentrates on a particular target topic and tries to visit and gather only relevant pages from the Web. A crucial issue for a focused crawler is the underlying heuristic for deciding the page to visit next. The authors propose a rule-based approach to improve a baseline focused crawler's harvest rate and coverage. The baseline focused crawler employs a canonical topic taxonomy to train a naïve-Bayesian classifier, which then helps score unseen URLs. The authors explore using simple rules derived from interclass (topic) linkage patterns to decide the crawler's next move. The rule-based approach also enhances the baseline crawler in supporting tunneling. In initial performance results, the rule-based crawler improved the harvest rate and coverage of the baseline crawler.