Computing discriminating and generic words

  • Authors:
  • Gregory Kucherov;Yakov Nekrich;Tatiana Starikovskaya

  • Affiliations:
  • Laboratoire d'Informatique Gaspard Monge, Université Paris-Est & CNRS, Paris, France;Department of Computer Science, University of Chile, Santiago, Chile;Lomonosov Moscow State University, Moscow, Russia,Laboratoire d'Informatique Gaspard Monge, Université Paris-Est & CNRS, Paris, France

  • Venue:
  • SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern P and a threshold d, we want to report (i) all longest extensions of P which occur in at least d documents, (ii) all shortest extensions of P which occur in less than d documents, and (iii) all shortest extensions of P which occur only in d selected documents. For these problems, we propose efficient algorithms based on suffix trees and using advanced data structure techniques. For problem (i), we propose an optimal solution with constant running time per output word.