Evolutionary ANNs for improving accuracy and efficiency in document classification methods

  • Authors:
  • Antonia Azzini;Paolo Ceravolo

  • Affiliations:
  • Dipartimento di Tecnologie dell'Informazione, Università degli Studi di Milano, Crema, Italy;Dipartimento di Tecnologie dell'Informazione, Università degli Studi di Milano, Crema, Italy

  • Venue:
  • KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Approaches to document classification belong to two major families: similarity-based (crisp) classification methods and neural networks (gradual) ones. For gradual techniques, a major open issue is controlling search space dimension. While similarity-based methods identify clusters based on the same number of variables used for document encoding, neural networks automatically identify variables that cause distinctions among clusters. Therefore, the variables' number may vary depending on the documents structure and content, and is difficult to estimate it a priori. This paper proposes a hybrid classification method suitable for heterogeneous document bases like the ones commonly encountered in business and knowledge management applications. Our method is based on an evolutionary algorithm for tuning both neural network's structure and weights. While searching the optimal neural network's configuration it is possible to determine the minimal number of variables to be used in order to classify the given set of documents.