Rapid and sensitive dot-matrix methods for genome analysis

  • Authors:
  • Yue Huang;Ling Zhang

  • Affiliations:
  • Lynnon Corporation, 116 rue du Milicien, Vaudreuil-Dorion, Quebec, Canada, J7V 9M4;Lynnon Corporation, 116 rue du Milicien, Vaudreuil-Dorion, Quebec, Canada, J7V 9M4

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Dot-matrix plots are widely used for similarity analysis of biological sequences. Many algorithms and computer software tools have been developed for this purpose. Though some of these tools have been reported to handle sequences of a few 100 kb, analysis of genome sequences with a length of 10 Mb on a microcomputer is still impractical due to long execution time and computer memory requirement. Results: Two dot-matrix comparison methods have been developed for analysis of large sequences. The methods initially locate similarity regions between two sequences using a fast word search algorithm, followed with an explicit comparison on these regions. Since the initial screening removes most of random matches, the computing time is substantially reduced. The methods produce high quality dot-matrix plots with low background noise. Space requirements are linear, so the algorithms can be used for comparison of genome size sequences. Computing speed may be affected by highly repetitive sequence structures of eukaryote genomes. A dot-matrix plot of Yeast genome (12 Mb) with both strands was generated in 80 s with a 1 GHz personal computer. Availability: The implementation of the described methods in C language is available at http://www.lynnon.com/dotplot/index.html