Enhancement of DTP feature selection method for text categorization

Authors:
Edgar Moyotl-Hernández;Héctor Jiménez-Salazar
Affiliations:
Facultad de Ciencias de la Computación, B. Universidad Autónoma de Puebla;Facultad de Ciencias de la Computación, B. Universidad Autónoma de Puebla
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 6
Cited 3

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Concept decompositions for large sparse text data using clustering

Machine Learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization

ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries

A framework of feature selection methods for text categorization

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Comparison of term frequency and document frequency based feature selection metrics in text categorization

Expert Systems with Applications: An International Journal
Clustering abstracts of scientific texts using the transition point technique

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper studies the structure of vectors obtained by using term selection methods in high-dimensional text collection. We found that the distance to transition point (DTP) method omits commonly occurring terms, which are poor discriminators between documents, but which convey important information about a collection. Experimental results obtained on the Reuters-21578 collection with the k-NN classifier show that feature selection by DTP combined with common terms outperforms slightly simple document frequency.