Segmenting strings homogeneously via trees

  • Authors:
  • Peter Damaschke

  • Affiliations:
  • Department of Computer Science and Engineering, Chalmers University, Göteborg, Sweden

  • Venue:
  • WG'07 Proceedings of the 33rd international conference on Graph-theoretic concepts in computer science
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We divide a string into k segments, each with only one sort of symbols, so as to minimize the total number of exceptions. Motivations come from machine learning and data mining. For binary strings we develop a linear-time algorithm for any k. Key to efficiency is a special-purpose data structure, called W-tree, which reflects relations between repetition lengths of symbols. Existence of algorithms faster than obvious dynamic programming remains open for non-binary strings. Our problem is also equivalent to finding weighted independent sets of prescribed size in paths. We show that this problem in bounded-degree graphs is FPT.