Deciding word neighborhood with universal neighborhood automata

Authors:
Petar Mitankin;Stoyan Mihov;Klaus U. Schulz
Affiliations:
Institute for Parallel Processing, Bulgarian Academy of Sciences, Block 25A, Acad. G. Bonchev Street, 1113 Sofia, Bulgaria;Institute for Parallel Processing, Bulgarian Academy of Sciences, Block 25A, Acad. G. Bonchev Street, 1113 Sofia, Bulgaria;Centrum für Informations- und Sprachverarbeitung, Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80538 München, Germany
Venue:
Theoretical Computer Science
Year:
2011

Citing 11
Cited 2

Algorithms for approximate string matching

Information and Control
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Synchronized rational relations of finite and infinite words

Theoretical Computer Science - Selected papers of the International Colloquium on Words, Languages and Combinatorics, Kyoto, Japan, August 1990
Deterministic part-of-speech tagging with finite-state transducers

Computational Linguistics
Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction

Computational Linguistics
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The String-to-String Correction Problem

Journal of the ACM (JACM)
Fast Approximate Search in Large Dictionaries

Computational Linguistics
The growth ratio of synchronous rational relations is unique

Theoretical Computer Science
Fast Selection of Small and Precise Candidate Sets from Dictionaries for Text Correction Tasks

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01

Computation of similarity: similarity search as computation

CiE'11 Proceedings of the 7th conference on Models of computation in context: computability in Europe
WallBreaker: overcoming the wall effect in similarity search

Proceedings of the Joint EDBT/ICDT 2013 Workshops

Quantified Score

Hi-index	5.23

Visualization

Abstract

Given some form of distance between words, a fundamental operation is to decide whether the distance between two given words w and v is within a given bound. In earlier work, we introduced the concept of a universal Levenshtein automaton for a given distance bound n. This deterministic automaton takes as input a sequence @g of bitvectors computed from w and v. The sequence @g is accepted iff the Levenshtein distance between w and v does not exceed n. The automaton is called universal since the same automaton can be used for arbitrary input words w and v, regardless of the underlying input alphabet. Here, we extend this picture. After introducing a large abstract family of generalized word distances, we exactly characterize those members where word neighborhood can be decided using universal neighborhood automata similar to universal Levenshtein automata. Our theoretical results establish several bridges to the theory of synchronized finite-state transducers and dynamic programming. For small neighborhood bounds, universal neighborhood automata can be held in main memory. This leads to very efficient algorithms for the above decision problem. Evaluation results show that these algorithms are much faster than those based on dynamic programming.