An architecture-centered framework for developing blog crawlers

  • Authors:
  • Rafael Ferreira;Patrick Brito;Jean Melo;Evandro Costa;Rinaldo Lima;Fred Freitas

  • Affiliations:
  • Federal University of Pernambuco, Recife, Brazil;Federal University of Alagoas, Maceió, Brazil;Federal University of Alagoas, Maceió, Brazil;Federal University of Alagoas, Maceió, Brazil;Federal University of Pernambuco, Recife, Brazil;Federal University of Pernambuco, Recife, Brazil

  • Venue:
  • Proceedings of the 27th Annual ACM Symposium on Applied Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Blogs have become interesting tools for knowledge generation and sharing. As a matter of fact, the activity on blogs doubles every two hundred days. Numerous applications could make use of this massive daily information in order to find out interesting interpretations. However, the dynamic nature of the blogosphere hinders the manual information extraction from it, promoting the development of new automated approaches. In this paper, we propose a component-based framework to create blog crawlers based on software architecture. This framework provides useful services for the blog analysis, including preprocessing, indexing, content extraction, classification, and tag recommendation. In addition, we report a case study represented by a blog recommendation system, which helps student interactions in educational forums. This research work also aims to demonstrate the effort reduction when creating an application for blog analysis caused by the proposed framework. Finally other aspects of the developed application, such as the system evolution impact, reusability, and instantiation cost are qualitatively discussed.