Information retrieval using a singular value decomposition model of latent semantic structure
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
Taxonomy-based Adaptive Web Search Method
ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Ontology-focused crawling of Web documents
Proceedings of the 2003 ACM symposium on Applied computing
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Exploring social annotations for the semantic web
Proceedings of the 15th international conference on World Wide Web
COMPASS: a concept-based web search engine for HTML, XML, and deep web data
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
OntoCrawler: A focused crawler with ontology-supported website models for information agents
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Focused crawlers selectively retrieve Web documents that are relevant to a predefined set of topics. To intelligently make predictions and decisions about relevant URLs and web pages, different topic models have been introduced to represent topic-specific knowledge. Yet it is difficult to support semantic interoperability among different models. Moreover, some manually specified additional semantic information, such as semantic markups and social annotations, could not be effectively used to improve crawling. This paper proposes to boost focused crawling with four kinds of semantic models and semantic information, including thesauruses, categories, ontologies, and folksonomies. A statistical semantic association model is proposed to integrate different semantic models, represent heterogeneous semantic information, and support semantic relevance computation. A focused crawling framework is developed which adopts both keyword based contents and different kinds of additional information for relevance prediction and ranking. Experiments show that the proposed model and framework effectively integrates heterogeneous semantic information for focused crawling.