Automatically extracting user reviews from forum sites

  • Authors:
  • Wei Liu;Hualiang Yan;Jianguo Xiao

  • Affiliations:
  • Institute of Scientific and Technical Information of China, Peking, 100038, China;Institute of Computer Science & Technology, Peking University, 100871, China;Institute of Computer Science & Technology, Peking University, 100871, China

  • Venue:
  • Computers & Mathematics with Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.09

Visualization

Abstract

User reviews in forum sites are the important information source for many popular applications (e.g., monitoring and analysis of public opinion), which are usually represented in form of structured records. To the best of our knowledge, little existing work reported in the literature has systemically investigated the problem of extracting user reviews from forum sites. Besides the variety of web page templates, user-generated reviews raise two new challenges. First, the inconsistency of review contents in terms of both the document object model (DOM) tree and visual appearance impair the similarity between review records; second, the review content in a review record corresponds to complicated subtrees rather than single nodes in the DOM tree. To tackle these challenges, we present WeRE - a system that performs automatic user review extraction by employing sophisticated techniques. The review records are extracted from web pages based on the proposed level-weighted tree similarity algorithm first, and then the review contents in records are extracted exactly by measuring the node consistency. Our experimental results based on 20 forum sites indicate that WeRE can achieve high extraction accuracy.