Visualization of K -Tuple Distribution in Procaryote Complete Genomes and Their Randomized Counterparts

Authors:
Huimin Xie;Bailin Hao
Affiliations:
-;-
Venue:
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Year:
2002

Citing 1
Cited 2

Theory and application of Marsaglia's monkey test for pseudorandom number generators

ACM Transactions on Modeling and Computer Simulation (TOMACS)

Shannon Information in Complete Genomes

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Combinatorics from bacterial genomes

COCOA'07 Proceedings of the 1st international conference on Combinatorial optimization and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

A few years ago we developed a simple scheme to visualize the string composition of long DNA sequences in terms of two- and one-dimensional (2D and 1D) histograms. While the patterns in the 2D histograms have been well understood, the structure of the 1D histograms has not been analyzed in details. It turns out that the structure of the 1D histograms of the genomic sequences and their randomized counterparts varies significantly depending on the g+c content of the genomes. In particular, the 1D histograms of some randomized sequences may show rich structure, a seemingly anti-intuitive result.Three approaches are used to explain the phenomenon: (1) Monte Carlo simulation, (2) exact computation by using the Goulden-Jackson cluster method, and (3) a Poisson approximation method. The multi-modal phenomena in K-histograms are well elucidated by the last approach.