Features selection from high-dimensional web data using clustering analysis

Authors:
Héctor Menéndez;Gema Bello-Orgaz;David Camacho
Affiliations:
Universidad Autónoma de Madrid, Francisco Tomás y Valiente, Cantoblanco, Madrid;Universidad Autónoma de Madrid, Francisco Tomás y Valiente, Cantoblanco, Madrid;Universidad Autónoma de Madrid, Francisco Tomás y Valiente, Cantoblanco, Madrid
Venue:
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Year:
2012

Citing 10
Cited 0

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Computational aspects of resilient data extraction from semistructured sources (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing (The Handbooks of Fuzzy Sets)

Fuzzy Models and Algorithms for Pattern Recognition and Image Processing (The Handbooks of Fuzzy Sets)
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A survey of evolutionary algorithms for clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Using the clustering coefficient to guide a genetic-based communities finding algorithm

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Genetic algorithms to simplify prognosis of endocarditis

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The features selection methodologies have become an important field of the data preprocessing techniques. These methods are applied to reduced the dimension of the attributes of different datasets to simplify their analysis. Some of the classical techniques used are wrapper approaches, heuristic functions and filters. The main problem of these approaches is that they usually are black box and computationally expensive algorithms. This work presents a new straightforward strategy to reduce the dimension of the attributes. This new methodology cares about the variables distribution and has been oriented to clustering analysis. It provides an easier human interpretation of the attributes selection strategy and the resulting clusters. Finally, this new approach has been experimentally tested using the FIFA World Cup web dataset, a well-known social-based statistical data with a high number of variables, to show how the features selection strategy find the most relevant variables.