RetriBlog: a framework for creating blog crawlers

  • Authors:
  • Rafael Ferreira;Rinaldo Lima;Jean Melo;Evandro Costa;Fred Freitas;Henrique Pacca

  • Affiliations:
  • Federal University of Pernambuco, Recife, Brazil;Federal, University of Pernambuco, Recife, Brazil;Federal University of Alagoas, Maceió, Brazil;Federal University of Alagoas, Maceió, Brazil;Federal University of Pernambuco, Recife, Brazil;Federal University of Alagoas, Maceió, Brazil

  • Venue:
  • Proceedings of the 27th Annual ACM Symposium on Applied Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Blogs are becoming an important social tool. By means of blogs, bloggers share their likes and dislikes, express their opinions, report news and form groups related to some subjects. Thus, the available information on the Blogsphere can certainly helps in the creation of interesting applications in various domains, such as e-learning, e-commerce, and e-government. However, due to the increasing number of blogs posted every day on the Web, and the dynamic nature of the Blogsphere, the tasks of collecting and extracting relevant information from blogs have become hard and time consuming. In this paper, we use techniques both from information retrieval and information extraction fields to deal with this problem. Since the blogs have many points of variability it is necessary to provide applications that can be easily adapted. We present the RetriBlog system, a framework for the development of blog crawlers dealing the variations in blogs. This paper presents the RetriBlog details and an evaluation of the proposed algorithms.