Feature selection algorithms to improve documents' classification performance

  • Authors:
  • Pedro A. C. Sousa;João Paulo Pimentão;Bruno René D. Santos;Fernando Moura-Pires

  • Affiliations:
  • Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Caparica, Portugal;Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia, Caparica, Portugal;UNINOVA Instituto de Desenvolvimento de Novas Tecnologias, Caparica, Portugal;Universidade de Évora Departamento de Informática, Évora, Portugal

  • Venue:
  • AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a study where feature selection algorithms were evaluated in order to improve documents' classification performance. The study was made during the project DEEPSIA, IST project Nr. 1999-20 283, funded by the European Union. The need to improve documents recognition was imposed by the need to increase the overall performance of the Framework for Internet data collection based on intelligent agents, used within the project. The Framework is briefly described and the learning techniques used are presented. The focus of this paper is on the feature selection algorithms, where the most relevant work was the use of Conditional Mutual Information, estimated using genetic algorithms, since the computational complexity of CKN invalidated an iterative approach. Methods, techniques and comparative results are presented in detail.