Text categorization using an ensemble classifier based on a mean co-association matrix

  • Authors:
  • Luís Moreira-Matias;João Mendes-Moreira;João Gama;Pavel Brazdil

  • Affiliations:
  • Departamento de Engenharia Informática, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal, LIAAD-INESC Porto L.A., Porto, Portugal;Departamento de Engenharia Informática, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal, LIAAD-INESC Porto L.A., Porto, Portugal;LIAAD-INESC Porto L.A., Porto, Portugal, Faculdade de Economia, Universidade do Porto, Porto, Portugal;LIAAD-INESC Porto L.A., Porto, Portugal, Faculdade de Economia, Universidade do Porto, Porto, Portugal

  • Venue:
  • MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text Categorization (TC) has attracted the attention of the research community in the last decade. Algorithms like Support Vector Machines, Naïve Bayes or k Nearest Neighbors have been used with good performance, confirmed by several comparative studies. Recently, several ensemble classifiers were also introduced in TC. However, many of those can only provide a category for a given new sample. Instead, in this paper, we propose a methodology --- MECAC --- to build an ensemble of classifiers that has two advantages to other ensemble methods: 1) it can be run using parallel computing, saving processing time and 2) it can extract important statistics from the obtained clusters. It uses the mean co-association matrix to solve binary TC problems. Our experiments revealed that our framework performed, on average, 2.04% better than the best individual classifier on the tested datasets. These results were statistically validated for a significance level of 0.05 using the Friedman Test.