Focused Crawling with Heterogeneous Semantic Information

Authors:
Rui Huang;Fen Lin;Zhongzhi Shi
Affiliations:
-;-;-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2008

Citing 11
Cited 1

Information retrieval using a singular value decomposition model of latent semantic structure

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
Accelerated focused crawling through online relevance feedback

Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
Taxonomy-based Adaptive Web Search Method

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Ontology-focused crawling of Web documents

Proceedings of the 2003 ACM symposium on Applied computing
Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Exploring social annotations for the semantic web

Proceedings of the 15th international conference on World Wide Web
COMPASS: a concept-based web search engine for HTML, XML, and deep web data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

OntoCrawler: A focused crawler with ontology-supported website models for information agents

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Focused crawlers selectively retrieve Web documents that are relevant to a predefined set of topics. To intelligently make predictions and decisions about relevant URLs and web pages, different topic models have been introduced to represent topic-specific knowledge. Yet it is difficult to support semantic interoperability among different models. Moreover, some manually specified additional semantic information, such as semantic markups and social annotations, could not be effectively used to improve crawling. This paper proposes to boost focused crawling with four kinds of semantic models and semantic information, including thesauruses, categories, ontologies, and folksonomies. A statistical semantic association model is proposed to integrate different semantic models, represent heterogeneous semantic information, and support semantic relevance computation. A focused crawling framework is developed which adopts both keyword based contents and different kinds of additional information for relevance prediction and ranking. Experiments show that the proposed model and framework effectively integrates heterogeneous semantic information for focused crawling.