Template-independent wrapper for web forums

  • Authors:
  • Qi Zhang;Yang Shi;Xuanjing Huang;Lide Wu

  • Affiliations:
  • Fundan University, Shanghai, China;Fundan University, Shanghai, China;Fundan University, Shanghai, China;Fundan University, Shanghai, China

  • Venue:
  • Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a novel work on the task of extracting data from Web forums. Millions of users contribute rich information to Web forum everyday, which has become an important resource for manyWeb applications, such as product opinion retrieval, social network analysis, and so on. The novelty of the proposed algorithm is that it can not only extract the pure text but also distinguish between the original post and replies. Experimental results on a large number of real Web forums indicate that the proposed algorithm can correctly ex