Olex: Effective Rule Learning for Text Categorization

  • Authors:
  • Pasquale Rullo;Veronica Lucia Policicchio;Chiara Cumbo;Salvatore Iiritano

  • Affiliations:
  • University of Calabria, Rende;University of Calabria, Rende;Exeura S.r.l., Rende;Exeura S.r.l., Rende

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes Olex, a novel method for the automatic induction of rule-based text classifiers. Olex supports a hypothesis language of the form "if T_{1} or \cdots or T_{n} occurs in document d, and none of T_{n + 1}, \ldots T_{n + m} occurs in d, then classify d under category c,” where each T_{i} is a conjunction of terms. The proposed method is simple and elegant. Despite this, the results of a systematic experimentation performed on the Reuters-21578, the Ohsumed, and the ODP data collections show that Olex provides classifiers that are accurate, compact, and comprehensible. A comparative analysis conducted against some of the most well-known learning algorithms (namely, Naive Bayes, Ripper, C4.5, SVM, and Linear Logistic Regression) demonstrates that it is more than competitive in terms of both predictive accuracy and efficiency.