Sparse nonnegative matrix factorization for protein sequence motif discovery

Authors:
Wooyoung Kim;Bernard Chen;Jingu Kim;Yi Pan;Haesun Park
Affiliations:
Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA;Department of Computer Science, University of Central Arkansas, Conway, AR 72035, USA;School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA;Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA;School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 10
Cited 2

An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
Data Mining and Machine Oriented Modeling: A Granular Computing Approach

Applied Intelligence
On Modeling Data Mining with Granular Computing

COMPSAC '01 Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Non-negative Matrix Factorization with Sparseness Constraints

The Journal of Machine Learning Research
FIK Model: Novel Efficient Granular Computing Model for Protein Sequence Motifs and Structure Information Discovery

BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
Improving molecular cancer class discovery through sparse non-negative matrix factorization

Bioinformatics
Learning Parts-Based Representations of Data

The Journal of Machine Learning Research
Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis

Bioinformatics
Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method

SIAM Journal on Matrix Analysis and Applications

Discriminative Orthogonal Nonnegative matrix factorization with flexibility for data representation

Expert Systems with Applications: An International Journal
Online dictionary learning algorithm with periodic updates and its application to image denoising

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

The problem of discovering motifs from protein sequences is a critical and challenging task in the field of bioinformatics. The task involves clustering relatively similar protein segments from a huge collection of protein sequences and culling high quality motifs from a set of clusters. A granular computing strategy combined with K-means clustering algorithm was previously proposed for the task, but this strategy requires a manual selection of biologically meaningful clusters which are to be used as an initial condition. This manipulated clustering method is undisciplined as well as computationally expensive. In this paper, we utilize sparse non-negative matrix factorization (SNMF) to cluster a large protein data set. We show how to combine this method with Fuzzy C-means algorithm and incorporate bio-statistics information to increase the number of clusters whose structural similarity is high. Our experimental results show that an SNMF approach provides better protein groupings in terms of similarities in secondary structures while maintaining similarities in protein primary sequences.