Automatic construction of polarity-tagged corpus from HTML documents

  • Authors:
  • Nobuhiro Kaji;Masaru Kitsuregawa

  • Affiliations:
  • the University of Tokyo, Tokyo, Japan;the University of Tokyo, Tokyo, Japan

  • Venue:
  • COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel method of building polarity-tagged corpus from HTML documents. The characteristics of this method is that it is fully automatic and can be applied to arbitrary HTML documents. The idea behind our method is to utilize certain layout structures and linguistic pattern. By using them, we can automatically extract such sentences that express opinion. In our experiment, the method could construct a corpus consisting of 126,610 sentences.