Focused crawling of tagged web resources using ontology

  • Authors:
  • Punam Bedi;Anjali Thukral;Hema Banati

  • Affiliations:
  • Computer Science Department, University of Delhi, Delhi 110 007, India;Computer Science Department, University of Delhi, Delhi 110 007, India;Dyal Singh College, Computer Science Department, University of Delhi, Delhi, India

  • Venue:
  • Computers and Electrical Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scrutinizing web resources of interest from a large number of search results is a tedious task for any web user. Fortunately, social sites such as Social Bookmarking Site (SBS) allow web users to store their preferences and searched results of their interest in the form of bookmarks. Such sites however contain lots of irrelevant data as noise and, predicting relevant URLs from the noise is a real challenge. With intent to overcome the challenge, this paper proposes a focused crawler, FCHC that mimics a human cognitive search pattern to find potentially relevant web resources from a SBS. The focused crawler utilizes domain specific Concept Ontology to semantically expand a search topic and to determine Semantic Relevance of tags. The crawler is tested with different search patterns on the 'database' domain and evaluated using a well established metric, harvest ratio. The performance of FCHC is analyzed and compared with focused crawlers that crawl the WWW using ontology and, without ontology.