Feature selection strategies for automated classification of digital media content

  • Authors:
  • Rocío Rocha;Ángel Cobo

  • Affiliations:
  • Department of Business Administration, University ofCantabria, Spain;Department of Applied Mathematics and ComputationalSciences, University of Cantabria, Spain

  • Venue:
  • Journal of Information Science
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes strategies for feature selection of digital news articles that allow an effective implementation of learning algorithms for the unsupervised classification of news articles. With the appropriate selection of a small subset of features a correct identification of related news can be achieved, thus enabling organizations and individual users to keep track of current events. The paper defines a quality measure of the discriminatory power of each feature and verifies that the selection of a feature subset with higher quality values allows obtaining good classification results. A Particle Swarm Optimization (PSO) based selection method is also proposed. Both proposals are validated on two collections of press clippings collated from news search services in digital media. Experimental results reveal that good classification accuracy can be achieved with small subsets of between 3 per cent and 6 per cent of the features.