On finding lowest common ancestors: simplification and parallelization
SIAM Journal on Computing
Trans-dichotomous algorithms for minimum spanning trees and shortest paths
Journal of Computer and System Sciences - Special issue: 31st IEEE conference on foundations of computer science, Oct. 22–24, 1990
Approximate data structures with applications
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Multidimensional divide-and-conquer
Communications of the ACM
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Perfect Hashing for Strings: Formalization and Algorithms
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Scaling and related techniques for geometry problems
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Rank/select operations on large alphabets: a tool for text indexing
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
I/O-efficient point location in a set of rectangles
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Persistent predecessor search and orthogonal point location on the word RAM
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Space-Efficient and fast algorithms for multidimensional dominance reporting and counting
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Cross-Document pattern matching
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
We study the following three problems of computing generic or discriminating words for a given collection of documents. Given a pattern P and a threshold d, we want to report (i) all longest extensions of P which occur in at least d documents, (ii) all shortest extensions of P which occur in less than d documents, and (iii) all shortest extensions of P which occur only in d selected documents. For these problems, we propose efficient algorithms based on suffix trees and using advanced data structure techniques. For problem (i), we propose an optimal solution with constant running time per output word.