Associative text categorization exploiting negated words

  • Authors:
  • Elena Baralis;Paolo Garza

  • Affiliations:
  • Politecnico di Torino, Corso Duca degli Abruzzi, Torino, Italy;Politecnico di Torino, Corso Duca degli Abruzzi, Torino, Italy

  • Venue:
  • Proceedings of the 2006 ACM symposium on Applied computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Associative classification has been recently applied to text document categorization. However, differently from classification of structured data, the quality of the generated classifier is rather low. This effect is mainly due to the poor precision of generated rules.To increase the precision of associative classifiers we propose the use of classification rules including negated words, i.e. words that the considered document should not contain. Rules are in the form "If a document includes words A and B, but not word Z, then it belongs to class C1". Mining classification rules with negated words becomes quickly intractable when decreasing the support threshold. We tackle this problem by means of an opportunistic approach, where negated words are only generated to specialize rules that may wrongly classify training documents. Hence precision is increased, without losing recall.Experiments on the Reuters corpus show that our classifier based on negated words achieves good precision and recall results, while yielding an easily interpretable model typical of associative classifiers.