An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
BLAST
Combinatorial Designs: Constructions and Analysis
Combinatorial Designs: Constructions and Analysis
Remote homology detection based on oligomer distances
Bioinformatics
Handbook of Combinatorial Designs, Second Edition (Discrete Mathematics and Its Applications)
Handbook of Combinatorial Designs, Second Edition (Discrete Mathematics and Its Applications)
Superiority of Spaced Seeds for Homology Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Segment-based multiple sequence alignment
Bioinformatics
Phase-only filtering for the masses (of DNA Data): a new approach to sequence alignment
IEEE Transactions on Signal Processing - Part II
Almost difference sets and their sequences with optimal autocorrelation
IEEE Transactions on Information Theory
Existence and nonexistence of almost-perfect autocorrelation sequences
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Availability of DNA data is growing roughly at the rate specified by Moore's law. In many molecular biology applications this data must be compared with a reference sequence, either to establish similarity of genomes or to identify functionally homologous subsequences. Current approaches based on pairwise sequence alignments are computationally expensive and often data dependent. To ameliorate this problem, alternative, less complex sequence comparison schemes, designed to capture the essential features of genomes, must be explored. In this work a new sequence comparison approach, based on difference set models, is proposed. These models are conceptually appropriate, as they quantify, in a certain sense, two key genome attributes: sequence complexity and symbol repetition. Moreover, it is shown that difference sets are abundant in bacterial genomes and that they coincide with homologous sequence segments. These findings motivate the construction of compact representations of DNA sequences in the difference set space. An alignment of these representations permits computationally efficient identification of differences between the DNA sequences. To illustrate the efficacy of the difference set approach, characterization of indels in closely related bacillus anthracis strains is performed, resulting in the discovery of two previously unreported collections of polymorphisms. In addition to these results, an open problem of extending the difference set approach to difference set and almost difference set families, for the analysis of more distant DNA sequences, is discussed.