Algorithms for subsequence combinatorics

  • Authors:
  • Cees Elzinga;Sven Rahmann;Hui Wang

  • Affiliations:
  • Department of Social Science Research Methods, VU University Amsterdam, The Netherlands;Bioinformatics for High-throughput Technologies, Computer Science 11, TU Dortmund, Germany;School of Computing and Mathematics, University of Ulster, Northern Ireland, UK

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2008

Quantified Score

Hi-index 5.23

Visualization

Abstract

A subsequence is obtained from a string by deleting any number of characters; thus in contrast to a substring, a subsequence is not necessarily a contiguous part of the string. Counting subsequences under various constraints has become relevant to biological sequence analysis, to machine learning, to coding theory, to the analysis of categorical time series in the social sciences, and to the theory of word complexity. We present theorems that lead to efficient dynamic programming algorithms to count (1) distinct subsequences in a string, (2) distinct common subsequences of two strings, (3) matching joint embeddings in two strings, (4) distinct subsequences with a given minimum span, and (5) sequences generated by a string allowing characters to come in runs of a length that is bounded from above.