Separable attributes: a technique for solving the sub matrices character count problem

  • Authors:
  • Amihood Amir;Kenneth W. Church;Emanuel Dar

  • Affiliations:
  • AT&T Labs --- Research, Florham Park, NJ;AT&T Labs --- Research, Florham Park, NJ;Bar-Ilan University, 52900 Ramat-Gan, Israel

  • Venue:
  • SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The subsequence character count problem has as its inputan array S = s1, ¡­,sn of symbols over alphabet ¦² and anatural number m. Its output is: for every i, i = 1,¡­, n - m + 1, the number ofdifferent alphabet symbols occurring in the subsequencesi, si+1, ¡­,si+m-1. The subsequence charactercount problem is a natural problem that has many uses. It can besolved in linear time for fixed finite alphabets and in timeO(n log m) for infinite alphabets. In [1] theproblem was used to solve the parameterized matchingproblem.The character count problem can be generalized to two dimensionsand becomes the submatrix character count problem. Its inputis an n x n matrix T over alphabet¦² and a natural number m. Its output is: forevery i,j, i,j = 1, ¡­, n - m + 1,the number of different alphabet symbols occurring in thesubmatrix T[i + k,j + ℓ], k =0, ¡­, m - 1;ℓ = 0, ¡­, m- 1.This problem was motivated by parameterized matching in twodimensions which is a good model for seeking a pattern in an imagewith a change of color map. The number of different colors in asubarea of an image is considered a "signature". There are manyimage processing tools that use this measure (see e.g. [5]).The straightforward one dimensional solution slides a windowalong the text adding an element and deleting an element at everystep. The problem with two dimensions is that at every move of thewindow there are m elements added and m deleted.In this paper we present an alternate solution that generalizesto two dimensions. We achieve a O(n2) timesolution to the submatrix character count problem over finite fixedalphabet and a O(n2 log m) solutionover an infinite alphabet.The submatrix character count problem is a special case of thecolor range query problem, where one needs to preprocess atwo dimensional nxn array T of symbols over alphabet¦² - the colors. Subsequently we are interested inanswers to queries of the type: Given intervals[i1,j1] and[i2,j2],i1,i2,j1,j2 ¦Å {1, ¡­, n} andi1 ¡Ü j1,i2 ¡Ü j2 give thenumber of different alphabet symbols (colors) occurring inthe submatrix T[k,ℓ], k =i1, ¡­,j1, ℓ= i2,¡­,j2.Jonardan and Lopez [6] showed that with aO(n2 log2 n)preprocessing one can answer queries in timeO(log2 n). This means that thesubmatrices character count problem can be solved in timeO(n2 log2 n) bypreprocessing and then querying, for every location, the m xm submatrix starting at that location.We are not aware of a faster direct approach for solving thesubmatrix character count problem. However, problems with a similarflavor, where the desired calculation is a convolution, are solvedin electrical engineering by a method called SeparableConvolutions or Separable Filters [4]. A similar notionwas used by Bird [3] and Baker [2] to solve the two dimensionalpattern matching problem.The contributions of this paper are two-fold. First, Wegeneralize the notion of separable convolutions toseparable attributes. We believe it is important to keepthis method in mind as an element of the basic algorithmic toolkit.It has proven useful in the past and, we think, will prove usefulfor solving various two-dimensional problems in the future.Secondly, We use the separable attributes method for providing thefastest algorithm yet for the submatrices character countproblem.A full version of this paper can be found athttp://www.cs.biu.ac.il/~amir/Postscripts/sep.ps.