Self-Organizing Approach for Automated Gene Identification

Authors:
Audrey Yu. Zinovyev;Alexander N. Gorban;Tatyana G. Popova
Affiliations:
Institut des Hautas Etudes Scientifiques, France, e-mail: zinovyev@ihes.fr;Institute of Computational Modeling of Russian Academy of Sciences Akademgorodok, Krasnoyarsk, 660036 Russia, e-mail: gorban@icm.krasn.ru;Institute of Computational Modeling of Russian Academy of Sciences Akademgorodok, Krasnoyarsk, 660036 Russia, e-mail: tanya@icm.krasn.ru
Venue:
Open Systems & Information Dynamics
Year:
2003

Citing 2
Cited 1

Self-organizing maps

Self-organizing maps
Clustering Algorithms

Clustering Algorithms

Detection of Gene Expressions in Microarrays by Applying Iteratively Elastic Neural Net

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Self-training technique for automated gene recognition both in entire genomes and in unassembled ones is proposed. It is based on a simple measure (namely, the vector of frequencies of non-overlapping triplets in sliding window), and needs neither predetermined information, nor preliminary learning. The sliding window length is the only one tuning parameter. It should be chosen close to the average exon length typical to the DNA text under investigation. An essential feature of the technique proposed is preliminary visualization of the set of vectors in the subspace of the first three principal components. It was shown, the distribution of DNA sites has the bullet-like structure with one central cluster (corresponding to non-coding sites) and three or six flank ones (corresponding to protein-coding sites). The bullet-like structure itself revealed in the distribution seems to be very interesting illustration of triplet usage in DNA sequence. The method was examined on several genomes (mitochondrion of P.wickerhamii, bacteria C.crescentus and primitive eukaryot S.cerevisiae). The percentage of truly predicted nucleotides exceeds 90%.