Enhancement of DTP feature selection method for text categorization

  • Authors:
  • Edgar Moyotl-Hernández;Héctor Jiménez-Salazar

  • Affiliations:
  • Facultad de Ciencias de la Computación, B. Universidad Autónoma de Puebla;Facultad de Ciencias de la Computación, B. Universidad Autónoma de Puebla

  • Venue:
  • CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper studies the structure of vectors obtained by using term selection methods in high-dimensional text collection. We found that the distance to transition point (DTP) method omits commonly occurring terms, which are poor discriminators between documents, but which convey important information about a collection. Experimental results obtained on the Reuters-21578 collection with the k-NN classifier show that feature selection by DTP combined with common terms outperforms slightly simple document frequency.