Feature selection strategies for automated classification of digital media content

Authors:
Rocío Rocha;Ángel Cobo
Affiliations:
Department of Business Administration, University ofCantabria, Spain;Department of Applied Mathematics and ComputationalSciences, University of Cantabria, Spain
Venue:
Journal of Information Science
Year:
2011

Citing 13
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Strong similarity measures for ordered sets of documents in information retrieval

Information Processing and Management: an International Journal
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Text Mining: Predictive Methods for Analyzing Unstructured Information

Text Mining: Predictive Methods for Analyzing Unstructured Information
Adaptive Chinese word segmentation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Text feature selection using ant colony optimization

Expert Systems with Applications: An International Journal
Partitional clustering experiments with news documents

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Empirical study of feature selection methods based on individual feature evaluation for classification problems

Expert Systems with Applications: An International Journal
Hybrid feature selection by combining filters and wrappers

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes strategies for feature selection of digital news articles that allow an effective implementation of learning algorithms for the unsupervised classification of news articles. With the appropriate selection of a small subset of features a correct identification of related news can be achieved, thus enabling organizations and individual users to keep track of current events. The paper defines a quality measure of the discriminatory power of each feature and verifies that the selection of a feature subset with higher quality values allows obtaining good classification results. A Particle Swarm Optimization (PSO) based selection method is also proposed. Both proposals are validated on two collections of press clippings collated from news search services in digital media. Experimental results reveal that good classification accuracy can be achieved with small subsets of between 3 per cent and 6 per cent of the features.