Using typical testors for feature selection in text categorization

Authors:
Aurora Pons-Porrata;Reynaldo Gil-García;Rafael Berlanga-Llavori
Affiliations:
Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba;Center of Pattern Recognition and Data Mining, Universidad de Oriente, Santiago de Cuba, Cuba;Computer Science, Universitat Jaume I, Castellón, Spain
Venue:
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Year:
2007

Citing 12
Cited 3

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
Feature Subset Selection in Text-Learning

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Oscillating feature subset search algorithm for text categorization

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Parallel nearest neighbour algorithms for text categorization

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

BR: A New Method for Computing All Typical Testors

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
A fast implementation of the CT_EXT algorithm for the testor property identification

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Typical testors generation based on an evolutionary algorithm

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major difficulty of text categorization problems is the high dimensionality of the feature space. Thus, feature selection is often performed in order to increase both the efficiency and effectiveness of the classification. In this paper, we propose a feature selection method based on Testor Theory. This criterion takes into account inter-feature relationships. We experimentally compared our method with the widely used information gain using two well-known classification algorithms: k-nearest neighbour and Support Vector Machine. Two benchmark text collections were chosen as the testbeds: Reuters- 21578 and Reuters Corpus Version 1 (RCV1-v2). We found that our method consistently outperformed information gain for both classifiers and both data collections, especially when aggressive feature selection is carried out.