Classification With Finite Memory Revisited

  • Authors:
  • J. Ziv

  • Affiliations:
  • Dept. of Electr. Eng., Technion - Israel Inst. of Technol., Haifa

  • Venue:
  • IEEE Transactions on Information Theory
  • Year:
  • 2007

Quantified Score

Hi-index 754.84

Visualization

Abstract

We consider the class of strong-mixing probability laws with positive transitions that are defined on doubly infinite sequences in a finite alphabet A. A device called the classifier (or discriminator) observes a training sequence whose probability law Q is unknown. The classifier's task is to consider a second probability law P and decide whether P = Q, or P and Q are sufficiently different according to some appropriate criterion Delta(Q,P) > Delta. If the classifier has available an infinite amount of training data, this is a simple matter. However, here we study the case where the amount of training data is limited to N letters. We define a function NDelta(Q|P), which quantifies the minimum length sequence needed to distinguish Q and P and the class M(NDelta) of all probability laws pairs (Q,P) that satisfy NDelta(Q|P) les NDelta for some given positive number NDelta. It is shown that every pair Q,P of probability laws that are sufficiently different according to the Delta criterion is contained in M(NDelta). We demonstrate that for any universal classifier there exists some Q for which the classification probability lambda(Q) = 1 for some N-sequence emerging from Q, for some P : (Q,P) epsi M circ(NDelta).Delta(Q,P) > Delta, if N < NDelta. Conversely, we introduce a classification algorithm that is essentially optimal in the sense that for every (Q,P) epsi M(NDelta), the probability of classification error lambda(Q) is uniformly vanishing with N for every P : (Q,P) epsi M circ(NDelta) if N ges NDelta 1+O(log log N Delta /log N Delta ). The proposed algorithm finds the largest empirical conditional divergence for a set of contexts which appear in the tested N-sequence. The computational complexity of the classification algorithm is O(N2(log N)3). Also, we introduce a second simplified context classification algorithm with a computational complexity of only O(N(log N)4) that is efficient in the sense that for every pair (Q,P) epsi M(NDelta), the pairwise probability of classification error lambda(Q,P) for the pair Q,P vanishes with N if N ges NDelta 1+O(log log N Delta /log N Delta ). Conversely, lambda(Q,P) = 1 at least for some (Q,P) epsi M(NDelta), if N < NDelta.