Guilt-by-association feature selection: Identifying biomarkers from proteomic profiles

Authors:
Hyunjin Shin;Bryan Sheu;Maria Joseph;Mia K. Markey
Affiliations:
Department of Electrical and Computer Engineering, The University of Texas at Austin, United States;Department of Biomedical Engineering, The University of Texas, United States;Department of Computer Science, The University of Texas at Austin, United States;Department of Biomedical Engineering, The University of Texas, United States
Venue:
Journal of Biomedical Informatics
Year:
2008

Citing 9
Cited 3

Applied multivariate techniques

Applied multivariate techniques
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Introduction to Genetic Algorithms

An Introduction to Genetic Algorithms
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
Sample classification from protein mass spectrometry, by 'peak probability contrasts'

Bioinformatics
A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples

Journal of Biomedical Informatics
Data mining techniques for cancer detection using serum proteomic profiling

Artificial Intelligence in Medicine
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Demoting redundant features to improve the discriminatory ability in cancer data

Journal of Biomedical Informatics
Review Article: Stable feature selection for biomarker discovery

Computational Biology and Chemistry
Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, proteomic profiling by mass spectrometry has opened up a new realm of methods for identifying potential biomarkers. Mass spectrometry data, like other proteomic and genomic data, are challenging to analyze because of their high dimensionality and the availability of few samples. Hence, feature selection is extremely important because it directly provides a list of potential biomarkers by choosing a subset of effective features that separate diseased samples from healthy ones. The rule of thumb for feature selection is that features must be discriminant and independent for the best separation of the two groups. However, in general, existing feature selection algorithms only take into account the discrimination ability of features. In this paper, we present a novel method for feature selection, guilt-by-association feature selection (GBA-FS). The algorithm makes it possible to select features that are independent as well as discriminant. After measuring similarities between features, the algorithm groups together similar features using a clustering algorithm, and selects the best representative feature from each group. As a result, it produces a list of discriminant and independent features. The efficacy of GBA-FS was extensively tested on two real-world SELDI TOF data sets. The experimental results demonstrate that GBA-FS assists in selecting more independent features as compared to a common filter type feature selection method, the t test. The results also show that GBA-FS can be used to deconvolve multiply charged states of the same protein molecules. As GBA-FS successfully identifies feature groups with similar mass values, it can also be employed as an alternative to peak detection for preprocessing the mass spectrometry data.