RetriBlog: An architecture-centered framework for developing blog crawlers

  • Authors:
  • Rafael Ferreira;Fred Freitas;Patrick Brito;Jean Melo;Rinaldo Lima;Evandro Costa

  • Affiliations:
  • Federal University of Pernambuco, Av. Prof. Moraes Rego, 1235, Cidade Universitária, Recife, PE 50670-901, Brazil;Federal University of Pernambuco, Av. Prof. Moraes Rego, 1235, Cidade Universitária, Recife, PE 50670-901, Brazil;Federal University of Alagoas, Campus A. C. Simíes, Av. Lourival Melo Mota, s/n, Cidade Universitária, Maceió, AL 57072-900, Brazil;Federal University of Pernambuco, Av. Prof. Moraes Rego, 1235, Cidade Universitária, Recife, PE 50670-901, Brazil;Federal University of Pernambuco, Av. Prof. Moraes Rego, 1235, Cidade Universitária, Recife, PE 50670-901, Brazil;Federal University of Alagoas, Campus A. C. Simíes, Av. Lourival Melo Mota, s/n, Cidade Universitária, Maceió, AL 57072-900, Brazil

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 12.05

Visualization

Abstract

Blogs have become an important social tool. It allows the users to share their tastes, express their opinions, report news, form groups related to some subject, among others. The information obtained from the blogosphere may be used to create several applications in various fields. However, due to the growing number of blogs posted every day, as well as the dynamicity of the blogosphere, the task of extracting relevant information from the blogs has become difficult and time consuming. In this paper, we use information retrieval and extraction techniques to deal with this problem. Furthermore, as blogs have many variation points is required to provide applications that can be easily adapted. Faced with this scenario, the work proposes RetriBlog, an architecture-centered framework for the development of blog crawlers. Finally, it presents an evaluation of the proposed algorithms and three case studies.