A Topic-Specific Web Crawler with Concept Similarity Context Graph Based on FCA

  • Authors:
  • Yuekui Yang;Yajun Du;Jingyu Sun;Yufeng Hai

  • Affiliations:
  • School of Mathematical and Computers Science, Xihua University, Chengdu, China 610039;School of Mathematical and Computers Science, Xihua University, Chengdu, China 610039;College of Computer and Software, Taiyuan University of Technology, Taiyuan, China 030024;School of Mathematical and Computers Science, Xihua University, Chengdu, China 610039

  • Venue:
  • ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

With Internet growing exponentially, topic-specific web crawler is becoming more and more popular in the web data mining. How to order the unvisited URLs was studied deeply, we present the notion of concept similarity context graph, and propose a novel approach to topic-specific web crawler, which calculates the unvisited URLs' prediction score by concepts' similarity in Formal Concept Analysis (FCA), while improving the retrieval precision and recall ratio. We firstly build a concept lattice using the visited pages, extract the core concepts which reflect the user's query topic from the concept lattice, and then construct our concept similarity context graph based on the semantic similarities between the core concepts and other concepts.