Contrasting objective and subjective Portuguese texts from heterogeneous sources

  • Authors:
  • Michel Généreux;William Martinez

  • Affiliations:
  • Centro de Linguística da Universidade de Lisboa (CLUL), Lisboa, Portugal;Instituto de Linguística Téorica e Computacional (ILTEC), Lisboa, Portugal

  • Venue:
  • HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper contrasts the content and form of objective versus subjective texts. A collection of on-line newspaper news items serve as objective texts, while parliamentary speeches (debates) and blog posts form the basis of our subjective texts, all in Portuguese. The aim is to provide general linguistic patterns as used in objective written media and subjective speeches and blog posts, to help construct domain-independent templates for information extraction and opinion mining. Our hybrid approach combines statistical data along with linguistic knowledge to filter out irrelevant patterns. As resources for subjective classification are still limited for Portuguese, we use a parallel corpus and tools developed for English to build our subjective spoken corpus, through annotations produced for English projected onto a parallel corpus in Portuguese. A measure for the saliency of n-grams is used to extract relevant linguistic patterns deemed "objective" and "subjective". Perhaps unsurprisingly, our contrastive approach shows that, in Portuguese at least, subjective texts are characterized by markers such as descriptive, reactive and opinionated terms, while objective texts are characterized mainly by the absence of subjective markers.