Machine learning techniques for business blog search and mining

  • Authors:
  • Yun Chen;Flora S. Tsai;Kap Luk Chan

  • Affiliations:
  • School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 12.07

Visualization

Abstract

Weblogs, or blogs, have rapidly gained in popularity over the past few years. In particular, the growth of business blogs that are written by or provide commentary on businesses and companies opens up new opportunities for developing blog-specific search and mining techniques. In this paper, we propose probabilistic models for blog search and mining using two machine learning techniques, latent semantic analysis (LSA) and probabilistic latent semantic analysis (PLSA). We implement the models in our database of business blogs, BizBlogs07, with the aim of achieving higher precision and recall. The probabilistic model is able to segment the business blogs into separate topic areas, which is useful for keywords detection on the blogosphere. Various term-weighting schemes and factor values were also studied in detail, which reveal interesting patterns in our database of business blogs. Our multi-functional business blog system is indeed found to be very different from existing blog search engines, as it aims to provide better relevance and precision of the search.