On building minimal automaton for subset matching queries

  • Authors:
  • Kimmo Fredriksson

  • Affiliations:
  • School of Computing, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, Finland

  • Venue:
  • Information Processing Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.89

Visualization

Abstract

We address the problem of building an index for a set D of n strings, where each string location is a subset of some finite integer alphabet of size @s, so that we can answer efficiently if a given simple query string (where each string location is a single symbol) p occurs in the set. That is, we need to efficiently find a string d@?D such that p[i]@?d[i] for every i. We show how to build such index in O(n^l^o^g^"^@s^"^/^"^@D^(^@s^)log(n)) average time, where @D is the average size of the subsets. Our methods have applications e.g. in computational biology (haplotype inference) and music information retrieval.