Features selection from high-dimensional web data using clustering analysis

  • Authors:
  • Héctor Menéndez;Gema Bello-Orgaz;David Camacho

  • Affiliations:
  • Universidad Autónoma de Madrid, Francisco Tomás y Valiente, Cantoblanco, Madrid;Universidad Autónoma de Madrid, Francisco Tomás y Valiente, Cantoblanco, Madrid;Universidad Autónoma de Madrid, Francisco Tomás y Valiente, Cantoblanco, Madrid

  • Venue:
  • Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The features selection methodologies have become an important field of the data preprocessing techniques. These methods are applied to reduced the dimension of the attributes of different datasets to simplify their analysis. Some of the classical techniques used are wrapper approaches, heuristic functions and filters. The main problem of these approaches is that they usually are black box and computationally expensive algorithms. This work presents a new straightforward strategy to reduce the dimension of the attributes. This new methodology cares about the variables distribution and has been oriented to clustering analysis. It provides an easier human interpretation of the attributes selection strategy and the resulting clusters. Finally, this new approach has been experimentally tested using the FIFA World Cup web dataset, a well-known social-based statistical data with a high number of variables, to show how the features selection strategy find the most relevant variables.