Evaluation of feature combination approaches for text categorisation

Authors:
Robert Neumayer;Kjetil Nørvåg
Affiliations:
Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway;Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway
Venue:
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Year:
2011

Citing 12
Cited 0

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
Condorcet fusion for improved retrieval

Proceedings of the eleventh international conference on Information and knowledge management
High-performing feature selection for text classification

Proceedings of the eleventh international conference on Information and knowledge management
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Feature selection using linear classifier weights: interaction with classification models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Combining feature selectors for text classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
BNS feature scaling: an improved representation over tf-idf for svm text classification

Proceedings of the 17th ACM conference on Information and knowledge management
Reciprocal rank fusion outperforms condorcet and individual rank learning methods

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A hybrid approach for estimating document frequencies in unstructured P2P networks

Information Systems
Combination of feature selection methods for text categorisation

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorisation relies heavily on feature selection. Both the possible reduction in dimensionality as well as improvements in classification performance are highly desirable. To the end of feature selection for text, a range of different methods have been developed, each having unique properties and selecting different features. However, it remains unclear which of them can be combined and what benefits this brings with it. In this paper we present correlation methods for the analysis of feature rankings and evaluate the combination of features according to these metrics. We further show results of an extensive study of feature selection approaches using a wide range of combination methods. We performed experiments on 19 test collections and report our findings.