Intelligent document classification

  • Authors:
  • Rafael A. Calvo;H. A. Ceccatto

  • Affiliations:
  • Instituto de F\''{\'i}sica Rosario (CONICET-UNR), 27 de Febrero 210bis, 2000 Rosario, Argentina. E-mail: rafa@ifir.edu.ar;Instituto de F\''{\'i}sica Rosario (CONICET-UNR), 27 de Febrero 210bis, 2000 Rosario, Argentina. E-mail: rafa@ifir.edu.ar

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work we investigate some technical questions related tothe application of neural networks in document classification.First, we discuss the effects of different averaging protocols forthe \chi ^{2} statistic used to remove non-informative terms. Thisis an especially relevant issue for the neural network technique,which requires an aggressive dimensionality reduction to befeasible. Second, we estimate the importance of performancefluctuations due to inherent randomness in the training process ofa neural network, a point not properly addressed in previous works.Finally, we compare the neural network results with those obtainedusing the best methods for this application. For this we optimizethe network architecture by evaluating much larger nets thanpreviously considered in similar studies in the literature.