A Hybrid Approach to Optimize Feature Selection Process in Text Classification

Authors:
Roberto Basili;Alessandro Moschitti;Maria Teresa Pazienza
Affiliations:
-;-;-
Venue:
AI*IA 01 Proceedings of the 7th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Year:
2001

Citing 4
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
NLP-driven IR: evaluating performances over a text classification task

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection and weighting are the primary activity of every learning algorithm for text classification. Traditionally these tasks are carried out individually in two distinct phases: the first is the global feature selection during a corpus pre-processing and the second is the application of the feature weighting model. This means that two (or several) different techniques are used to optimize the performances even if a single algorithm may have more chances to operate the right choices. When the complete feature set is available, the classifier learning algorithm can better relate to the suitable representation level the different complex features like linguistic ones (e.g. syntactic categories associated to words in the training materialor terminological expressions). In [3] it has been suggested that classifiers based on generalized Rocchio formula can be used to weight features in category profiles in order to exploit the selectivity of linguistic information techniques in text classification. In this paper, a systematic study aimed to understand the role of Rocchio formula in selection and weighting of linguistic features will be described.