The submatrices character count problem: an efficient solution using separable values

  • Authors:
  • Amihood Amir;Kenneth W. Church;Emanuel Dar

  • Affiliations:
  • Department of Computer Science, Bar-Ilan University, 52900 Ramat-Gan, Israel and Georgia Tech.;AT&T Labs - Research, Shannon Laboratory, D235, 180 Park Avenue, Florham Park, NJ;Department of Computer Science, Bar-Ilan University, 52900 Ramat-Gan, Israel

  • Venue:
  • Information and Computation
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

The subsequence character count problem has as its input an array S = S[1],...,S[n] of symbols over alphabet Σ and a natural number m. Its output is: for every i, i = 1,....,n - m + 1, the number of different alphabet symbols occurring in the subsequence S[i],S[i + 1],...., S[i + m - 1]. The subsequence character count problem is a natural problem that has many uses. It can be solved in linear time for finite alphabets and in time O(n log m) for infinite alphabets. When the character count problem is generalized to two dimensions it becomes the submatrix character count problem. Its input is an n × n matrix T over alphabet Σ and a natural number m. Its output is: for every i,j, i,j = 1,...,n - m + 1, the number of different alphabet symbols occurring in the submatrix T[i + k,j + l], k = 0,...,m - 1; l = 0,...,m - 1. The straightforward one-dimensional solution slides a window along the text adding an element and deleting an element at every step. The problem with two dimensions is that at every move of the window there are m elements added and m deleted. In this paper, we present an alternate one-dimensional solution that generalizes to two dimensions. We achieve a O(n2) time solution to the submatrix character count problem over a finite alphabet and a O(n2 log m) solution over an infinite alphabet.