Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words

  • Authors:
  • Hiroki Arimura;Shinichi Shimozono

  • Affiliations:
  • -;-

  • Venue:
  • ISAAC '98 Proceedings of the 9th International Symposium on Algorithms and Computation
  • Year:
  • 1998

Quantified Score

Hi-index 0.02

Visualization

Abstract

We study the efficient discovery of word-association patterns, defined by a sequence of strings and a proximity gap, from a collection of texts with binary labels. We present an algorithm that finds all d strings and k proximity word-association patterns that maximizes agreement with the labels. It runs in expected time complexity O(kd-1n logd+1 n) and O(kd-1n) space with the total length n of texts, if texts are uniformly random strings. We also show that the problem to find a best word-association pattern with arbitrarily many strings is MAX SNP-hard.