Gene-finding via tandem mass spectrometry

Authors:
Ting Chen
Affiliations:
Department of Mathematics, University of Southern, California, Los Angeles, CA
Venue:
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Year:
2001

Citing 6
Cited 0

Introduction to algorithms

Introduction to algorithms
De Novo peptide sequencing via tandem mass spectrometry: a graph-theoretical approach

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Mutation-tolerant protein identification by mass-spectrometry

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Algorithms for identifying protein cross-links via tandem mass spectrometry

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Reducing Mass Degeneracy in SAR by MS by Stable Isotopic Labeling

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new gene-finding methodology that combines high performance liquid chromatograph (HPLC)-tandem mass spectrometry experiments with a fast computer algorithm to locate coding regions and introns. Proteins are first extracted from cells and digested by enzymes, and then the resulting peptides are separated and analyzed by HPLC-tandem mass spectrometry. We designed an algorithm to find DNA coding sequences, corresponding to open reading frames (ORF), in the genome such that their translated amino acid sequences are optimally correlated with these tandem mass spectra. In this algorithm, we also allow one gap, corresponding to an intron, between two DNA coding sequences, such that their concatenation becomes one coding sequence. Finally, the algorithm assembles these candidate coding sequences and introns into gene structures. Our algorithm was implemented to predict genes on 4 contigs with a total of 123 kbps using two sets of simulated digestion- HPLC-tandem mass spectrometry data of 2523 Caenorhabditis elegans Chromosome IV proteins, digested by trypsin and Asp-N respectively. Among 15 annotated genes in the forward strand, all 98 exons are hit by the predicted no-gap coding sequences, and 60 out of 83 introns are correctly predicted. We also tested gene structure prediction in a contig containing 3 genes. Combining splicing site predictions with predicted coding sequences and introns, we found all 3 gene structures.