An approach for selecting seed URLs of focused crawler based on user-interest ontology

  • Authors:
  • Yajun Du;Yufeng Hai;Chunzhi Xie;Xiaoming Wang

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Seed URLs selection for focused Web crawler intends to guide related and valuable information that meets a user's personal information requirement and provide more effective information retrieval. In this paper, we propose a seed URLs selection approach based on user-interest ontology. In order to enrich semantic query, we first intend to apply Formal Concept Analysis to construct user-interest concept lattice with user log profile. By using concept lattice merger, we construct the user-interest ontology which can describe the implicit concepts and relationships between them more appropriately for semantic representation and query match. On the other hand, we make full use of the user-interest ontology for extracting the user interest topic area and expanding user queries to receive the most related pages as seed URLs, which is an entrance of the focused crawler. In particular, we focus on how to refine the user topic area using the bipartite directed graph. The experiment proves that the user-interest ontology can be achieved effectively by merging concept lattices and that our proposed approach can select high quality seed URLs collection and improve the average precision of focused Web crawler.