The effect of target vector selection on the invariance of classifier performance measures
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Many bacterial pathogens employ a Type III secretion system (TTSS) to deliver specific proteins (or "substrates") into a host cytoplasm in order to interfere with defense responses and alter physiology. In this work, we present a computational formalism for characterizing the compositional properties of the Type III secretion signal. While various rule sets derived from empirical observations have been suggested, developing a consistent and comprehensive description of the TTSS signal is still of interest. This problem differs from typical signal peptide classification and identi fication problems (e.g. - nuclear, chloroplast, mitochondrial signal peptides) because known TTSS substrates lack the similarity expected from signal sequences involved in a similar function (e.g. - from a multiple alignment pro- file or signal consensus sequence). Using a training set derived from empirically verified substrate sequences in Pseudomonas syringae, we apply divergence measures derived from information theory in order to classify similar patterns and characterize the Type III signal. The TTSS characterization developed in this work leads to a diffuse targeting signal confined to the first 50 amino acids starting from the N-terminus. Finally, using the P. syringae training set, the method is applied to verify and predict substrate candidates in other organisms possessing a TTSS.