Analysing part-of-speech for portuguese text classification

  • Authors:
  • Teresa Gonçalves;Cassiana Silva;Paulo Quaresma;Renata Vieira

  • Affiliations:
  • Dep. Informática, Universidade de Évora, Évora, Portugal;Unisinos, São Leopoldo, RS, Brasil;Dep. Informática, Universidade de Évora, Évora, Portugal;Unisinos, São Leopoldo, RS, Brasil

  • Venue:
  • CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes and evaluates the use of linguistic information in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Support Vector Machines (SVM), which are known to produce good results on text classification tasks. Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de São Paulo) and juridical documents from the Portuguese Attorney General’s Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong reduction of the number of features needed in the text classification.