Improving performance of text categorization by combining filtering and support vector machines: Research Articles

  • Authors:
  • Irene Díaz;José Ranilla;Elena Montañes;Javier Fernández;Elías F. Combarro

  • Affiliations:
  • Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain;Artificial Intelligence Center, University of Oviedo, Campus de Viesques, Gijón, (Asturias), Spain

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text Categorization is the process of assigning documents to a set of previously fixed categories. A lot of research is going on with the goal of automating this time-consuming task. Several different algorithms have been applied, and Support Vector Machines (SVM) have shown very good results. In this report, we try to prove that a previous filtering of the words used by SVM in the classification can improve the overall performance. This hypothesis is systematically tested with three different measures of word relevance, on two different corpus (one of them considered in three different splits), and with both local and global vocabularies. The results show that filtering significantly improves the recall of the method, and that also has the effect of significantly improving the overall performance. © 2005 Wiley Periodicals, Inc.