A framework for building web mining applications in the world of blogs: A case study in product sentiment analysis

  • Authors:
  • Evandro Costa;Rafael Ferreira;Patrick Brito;Ig Ibert Bittencourt;Olavo Holanda;Aydano Machado;Tarsis Marinho

  • Affiliations:
  • Computation Institute, Federal University of Alagoas, GrOW - Grupo de Otimização da Web, Postal 15.064, 57072-970 Maceió, AL, Brazil;Computation Institute, Federal University of Alagoas, GrOW - Grupo de Otimização da Web, Postal 15.064, 57072-970 Maceió, AL, Brazil;Computation Institute, Federal University of Alagoas, GrOW - Grupo de Otimização da Web, Postal 15.064, 57072-970 Maceió, AL, Brazil;Computation Institute, Federal University of Alagoas, GrOW - Grupo de Otimização da Web, Postal 15.064, 57072-970 Maceió, AL, Brazil;Computation Institute, Federal University of Alagoas, GrOW - Grupo de Otimização da Web, Postal 15.064, 57072-970 Maceió, AL, Brazil;Computation Institute, Federal University of Alagoas, GrOW - Grupo de Otimização da Web, Postal 15.064, 57072-970 Maceió, AL, Brazil;Computation Institute, Federal University of Alagoas, GrOW - Grupo de Otimização da Web, Postal 15.064, 57072-970 Maceió, AL, Brazil

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

Recently there has been much interest in electronic commerce applications that use data mining techniques to explore datasets in the social media context. However, most of the applications have already been developed in an ad hoc manner, mainly, due to the lack of adequate tools, yielding difficulties in customizing applications and requesting high time consuming for constructing and maintaining these applications. This work addresses these problems and proposes a software framework for building Web mining applications in the blog world. The architecture of the proposed framework combines the use of blog crawling and data mining algorithms, in order to provide a complete and flexible solution for building general-purpose Web mining applications. The framework flexibility allows some important customizations, such as the construction of adapters for reading text from different blogs, and the use of different pre-processing techniques and data mining algorithms. In order to improve the efficacy of information extraction from blogs, ontology is used in the blog's description. For this, there are software agents responsible for tracking and indexing blogs related to a specific tag and for mining blog datasets. Moreover, web services are used for encapsulating existing tools and maximize reuse. This framework has been instantiated in order to be applied for helping the blog users to effectively find out relevant information in the blog world. The focus of this paper is on describing the novel software architecture of the general framework (blog crawling and data mining) providing detailed information about the data mining sub-framework, which uses the semantic web services technology for automating service composition and consists on the main research contribution. A case study of an e-commerce application for analyzing the user's sentiment regarding specific products is reported and its results considers the effort reduction when creating a web mining application by using the proposed integrated frameworks and existing data mining tools, as well as a qualitative analysis related to quality aspects of the developed application, such as the evolution impact.