Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Information Retrieval
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Strong similarity measures for ordered sets of documents in information retrieval
Information Processing and Management: an International Journal
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Text Mining: Predictive Methods for Analyzing Unstructured Information
Text Mining: Predictive Methods for Analyzing Unstructured Information
Adaptive Chinese word segmentation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Text feature selection using ant colony optimization
Expert Systems with Applications: An International Journal
Partitional clustering experiments with news documents
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Expert Systems with Applications: An International Journal
Hybrid feature selection by combining filters and wrappers
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
This paper proposes strategies for feature selection of digital news articles that allow an effective implementation of learning algorithms for the unsupervised classification of news articles. With the appropriate selection of a small subset of features a correct identification of related news can be achieved, thus enabling organizations and individual users to keep track of current events. The paper defines a quality measure of the discriminatory power of each feature and verifies that the selection of a feature subset with higher quality values allows obtaining good classification results. A Particle Swarm Optimization (PSO) based selection method is also proposed. Both proposals are validated on two collections of press clippings collated from news search services in digital media. Experimental results reveal that good classification accuracy can be achieved with small subsets of between 3 per cent and 6 per cent of the features.