Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids

  • Authors:
  • Hong-Jie Yu;De-Shuang Huang

  • Affiliations:
  • Tongji University, Shanghai;Anhui Science and Technology University, Fengyang

  • Venue:
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Based on all kinds of adjacent amino acids (AAA), we map each protein primary sequence into a 400 by ($(L-1)$) matrix $({\schmi M})$. In addition, we further derive a normalized 400-tuple mathematical descriptors $({\schmi D})$, which is extracted from the primary protein sequences via singular values decomposition (SVD) of the matrix. The obtained 400-D normalized feature vectors (NFVs) further facilitate our quantitative analysis of protein sequences. Using the normalized representation of the primary protein sequences, we analyze the similarity for different sequences upon two data sets: 1) ND5 sequences from nine species and 2) transferrin sequences of 24 vertebrates. We also compared the results in this study with those from other related works. These two experiments illustrate that our proposed NFV-AAA approach does perform well in the field of similarity analysis of sequence.