An annotated k-deep prefix tree for (1-k)-mer based sequence comparisons

  • Authors:
  • Adrienne Breland;Karen Schlauch;Monica Nicolescu;Frederick C. Harris, Jr.

  • Affiliations:
  • University of Nevada, Reno, NV;University of Nevada, Reno, NV;University of Nevada, Reno, NV;University of Nevada, Reno, NV

  • Venue:
  • Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this report, we describe an algorithm for a k-deep annotated prefix tree. The algorithm provides an alignment-free method for comparing nucleotide sequences in a computationally efficient manner. Differences in genomic sequences are measured by recording and comparing counts of words of length k or less in each sequence using the algorithm. Tree nodes are annotated with lists to store the number of times each word occurs in each of a group of sequences. Count differences among multiple sequences may be computed in a single tree traversal. Such a tree is built in linear time and spatially bounded by tree depth rather than sequence length(s). We then compare sequence groups of both E. coli and Influenza A virus H1N1 to demonstrate the utilitiy of a k-deep prefix tree when used as sequence comparison tool.