A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Compression algorithms for real programmers
Compression algorithms for real programmers
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Incremental construction and maintenance of minimal finite-state automata
Computational Linguistics
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
SAICSIT '04 Proceedings of the 2004 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Hi-index | 0.01 |
In this report, we describe an algorithm for a k-deep annotated prefix tree. The algorithm provides an alignment-free method for comparing nucleotide sequences in a computationally efficient manner. Differences in genomic sequences are measured by recording and comparing counts of words of length k or less in each sequence using the algorithm. Tree nodes are annotated with lists to store the number of times each word occurs in each of a group of sequences. Count differences among multiple sequences may be computed in a single tree traversal. Such a tree is built in linear time and spatially bounded by tree depth rather than sequence length(s). We then compare sequence groups of both E. coli and Influenza A virus H1N1 to demonstrate the utilitiy of a k-deep prefix tree when used as sequence comparison tool.