An extension of the Burrows–Wheeler Transform
Theoretical Computer Science
Distance measures for biological sequences: Some recent approaches
International Journal of Approximate Reasoning
Lightweight BWT construction for very large string collections
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Lightweight LCP construction for next-generation sequencing datasets
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Lightweight algorithms for constructing and inverting the BWT of string collections
Theoretical Computer Science
Hi-index | 0.00 |
In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences.