A Generalization of the Suffix Tree to Square Matrices, with Applications

  • Authors:
  • Raffaele Giancarlo

  • Affiliations:
  • -

  • Venue:
  • SIAM Journal on Computing
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a new data structure, the Lsuffix tree, which generalizes McCreight's suffix tree for a string [J. Assoc. Comput. Mach., 23 (1976), pp. 262--272] to a square matrix. All matrices have entries from a totally ordered alphabet $\Sigma$. Based on the Lsuffix tree, we give efficient algorithms for the static versions of the following dual problems that arise in low-level image processing and visual databases. Two-dimensional pattern retrieval. We have a library of texts $S=\{TEXT^1,\cdots, TEXT^r\}$, where $TEXT^i$ is an $n_i\times n_i$ matrix, $ 1 \leq i \leq r$. We may preprocess the library. Then, given an $m \times m$, $m \leq n_i$, $ 1 \leq i \leq r$, pattern matrix $PAT$, we want to find all occurrences of $PAT$ in $TEXT$, for all $TEXT \in S$. Let $t(S)=\Sigma_{i=1}^r n_i^2$ be the size of the library. The preprocessing step builds the Lsuffix tree for the matrices in $S$ and then transforms it into an index (a trie defined over $\Sigma$). It takes $O(t(S)( \log |\Sigma| +\log t(S)))$ time and $O(t(S))$ space. The index can be queried directly in $O(m^2\log |\Sigma|+totocc)$ time, where $totocc$ is the total number of occurrences of $PAT$ in $TEXT$, for all $TEXT \in S$. Two-dimensional dictionary matching. We have a dictionary of patterns $DC=\{PAT_1, \cdots, PAT_s\}$, where $PAT_i$ is of dimension $m_i\times m_i$, $ 1 \leq i \leq s$. We may preprocess the dictionary. Then, given an $n \times n$ text matrix $TEXT$, we want to search for all occurrences of patterns in the dictionary in the text. Let $t(DC)=\Sigma_{i=1}^s m^2_i$ be the size of the dictionary and let $\overline{t}(DC)$ be the sum of the $m_i$'s. The preprocessing consists of building the Lsuffix tree for the matrices in $DC$. It takes $O(t(DC)\log |\Sigma| +\overline{t}(DC)\log \overline{t}(DC)))$ time and $O(t(DC))$ space. The search step takes $O(n^2(\log |\Sigma|+\log \overline{t}(DC))+totocc)$ time, where $totocc$ is the total number of occurrences of patterns in the text. Both problems have a dynamic version in which the library and the dictionary, respectively, can be updated by insertion or deletion of square matrices in them. In a companion paper, we will provide algorithms for the dynamic version.