Blog post and comment extraction using information quantity of web format

  • Authors:
  • Donglin Cao;Xiangwen Liao;Hongbo Xu;Shuo Bai

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing and Graduate School, the Chinese Academy of Sciences, Beijing and Dept. of Cognitive Science, Xiamen University, Xiamen, P.R ...;Institute of Computing Technology, Chinese Academy of Sciences, Beijing and Graduate School, the Chinese Academy of Sciences, Beijing;Institute of Computing Technology, Chinese Academy of Sciences, Beijing;Institute of Computing Technology, Chinese Academy of Sciences, Beijing

  • Venue:
  • AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the development of the research on blogosphere, acquiring the post and comment from blog page becomes more important in improving the search performance. In this paper, we present a two-stage method. First, we combine the advantage of the vision information and the effective text information to locate the main text which represents the theme of blog page. Second, we use the information quantity of separator to detect the boundary between the post and comment. According to our experiments, this method achieves a good performance in extraction and improves the performance of blog search.