An annotated k-deep prefix tree for (1-k)-mer based sequence comparisons

Authors:
Adrienne Breland;Karen Schlauch;Monica Nicolescu;Frederick C. Harris, Jr.
Affiliations:
University of Nevada, Reno, NV;University of Nevada, Reno, NV;University of Nevada, Reno, NV;University of Nevada, Reno, NV
Venue:
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Year:
2010

Citing 6
Cited 0

A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Compression algorithms for real programmers

Compression algorithms for real programmers
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Incremental construction and maintenance of minimal finite-state automata

Computational Linguistics
A Fast Algorithm for the Exhaustive Analysis of 12-Nucleotide-Long DNA Sequences. Applications to Human Genomics

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
An efficient implementation of the d2 distance function for EST clustering: preliminary investigations

SAICSIT '04 Proceedings of the 2004 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this report, we describe an algorithm for a k-deep annotated prefix tree. The algorithm provides an alignment-free method for comparing nucleotide sequences in a computationally efficient manner. Differences in genomic sequences are measured by recording and comparing counts of words of length k or less in each sequence using the algorithm. Tree nodes are annotated with lists to store the number of times each word occurs in each of a group of sequences. Count differences among multiple sequences may be computed in a single tree traversal. Such a tree is built in linear time and spatially bounded by tree depth rather than sequence length(s). We then compare sequence groups of both E. coli and Influenza A virus H1N1 to demonstrate the utilitiy of a k-deep prefix tree when used as sequence comparison tool.