Sentiment-focused web crawling

  • Authors:
  • A. Gural Vural;B. Barla Cambazoglu;Pinar Senkul

  • Affiliations:
  • Middle East Technical University, Ankara, Turkey;Yahoo! Research, Barcelona, Spain;Middle East Technical University, Ankara, Turkey

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The sentiments and opinions that are expressed in web pages towards objects, entities, and products constitute an important portion of the textual content available in the Web. Despite the vast interest in sentiment analysis and opinion mining, somewhat surprisingly, the discovery of the sentimental or opinionated web content is mostly ignored. This work aims to fill this gap and address the problem of quickly discovering and fetching the sentimental content present in the Web. To this end, we design a sentiment-focused web crawling framework for faster discovery and retrieval of such content. In particular, we propose different sentiment-focused web crawling strategies that prioritize discovered URLs based on their predicted sentiment scores. Through simulations, these strategies are shown to achieve considerable performance improvement over general-purpose web crawling strategies in discovering sentimental content.