A multistrategy approach for digital text categorization from imbalanced documents

  • Authors:
  • M. Dolores del Castillo;José Ignacio Serrano

  • Affiliations:
  • Instituto de Automática Industrial (CSIC), Madrid, Spain;Instituto de Automática Industrial (CSIC), Madrid, Spain

  • Venue:
  • ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

The goal of the research described here is to develop a multistrategy classifier system that can be used for document categorization. The system automatically discovers classification patterns by applying several empirical learning methods to different representations for preclassified documents belonging to an imbalanced sample. The learners work in a parallel manner, where each learner carries out its own feature selection based on evolutionary techniques and then obtains a classification model. In classifying documents, the system combines the predictions of the learners by applying evolutionary techniques as well. The system relies on a modular, flexible architecture that makes no assumptions about the design of learners or the number of learners available and guarantees the independence of the thematic domain.