Computing discriminating and generic words

Authors:
Gregory Kucherov;Yakov Nekrich;Tatiana Starikovskaya
Affiliations:
Laboratoire d'Informatique Gaspard Monge, Université Paris-Est & CNRS, Paris, France;Department of Computer Science, University of Chile, Santiago, Chile;Lomonosov Moscow State University, Moscow, Russia,Laboratoire d'Informatique Gaspard Monge, Université Paris-Est & CNRS, Paris, France
Venue:
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Year:
2012

Citing 13
Cited 0

On finding lowest common ancestors: simplification and parallelization

SIAM Journal on Computing
Trans-dichotomous algorithms for minimum spanning trees and shortest paths

Journal of Computer and System Sciences - Special issue: 31st IEEE conference on foundations of computer science, Oct. 22–24, 1990
Approximate data structures with applications

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Multidimensional divide-and-conquer

Communications of the ACM
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Color Set Size Problem with Application to String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Perfect Hashing for Strings: Formalization and Algorithms

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Scaling and related techniques for geometry problems

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Rank/select operations on large alphabets: a tool for text indexing

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
I/O-efficient point location in a set of rectangles

LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Persistent predecessor search and orthogonal point location on the word RAM

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Space-Efficient and fast algorithms for multidimensional dominance reporting and counting

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Cross-Document pattern matching

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern P and a threshold d, we want to report (i) all longest extensions of P which occur in at least d documents, (ii) all shortest extensions of P which occur in less than d documents, and (iii) all shortest extensions of P which occur only in d selected documents. For these problems, we propose efficient algorithms based on suffix trees and using advanced data structure techniques. For problem (i), we propose an optimal solution with constant running time per output word.