Uniquely decodable n-gram embeddings

  • Authors:
  • Leonid Kontorovich

  • Affiliations:
  • Center for Automated Learning and Discovery, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2004

Quantified Score

Hi-index 5.23

Visualization

Abstract

We define the family of n-gram embeddings from strings over a finite alphabet into the semimodule NK. We classify all ξ ∈ NK that are valid images of strings under such embeddings, as well as all ξ whose inverse image consists of exactly 1 string (we call such ξ uniquely decodable). We prove that for a fixed alphabet, the set of all strings whose image is uniquely decodable is a regular language.