Fast algorithms for finding nearest common ancestors
SIAM Journal on Computing
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
On the sorting-complexity of suffix tree construction
Journal of the ACM (JACM)
Verifying candidate matches in sparse and wildcard matching
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Journal of Algorithms
Faster Algorithms for String Matching Problems: Matching the Convolution Bound
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Efficient text fingerprinting via Parikh mapping
Journal of Discrete Algorithms
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Novel Transformation Techniques Using Q-Heaps with Applications to Computational Geometry
SIAM Journal on Computing
Linear work suffix array construction
Journal of the ACM (JACM)
Journal of Discrete Algorithms
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Haplotype inference by pure Parsimony
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
New algorithms for text fingerprinting
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Suffix trays and suffix trists: structures for faster text indexing
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
A hidden markov technique for haplotype reconstruction
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Haplotype inference via hierarchical genotype parsing
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Hi-index | 0.00 |
We consider a subset matching variant of the Dictionary Query problem. Consider a dictionary D of n strings, where each string location contains a set of characters drawn from some alphabet Σ={1,...,|Σ|}. Our goal is to preprocess D so when given a query pattern p, where each location in p contains a single character from Σ, we answer if p matches to D. p is said to match to D if there is some s∈D where |p|=|s| and p[i]∈s[i] for every 1≤i≤|p|. To achieve a query time of O(|p|), we construct a compressed trie of all possible patterns that appear in D. Assuming that for every s∈D there are at most k locations where |s[i]|1, we present two constructions of the trie that yield a preprocessing time of O(nm+|Σ|kn log( min {n,m})), where n is the number of strings in D and m is the maximum length of a string in D. The first construction is based on divide and conquer and the second construction uses ideas introduced in [2] for text fingerprinting. Furthermore, we show how to obtain O(nm+|Σ|kn+|Σ|k/2nlog( min {n,m})) preprocessing time and O(|p|loglog|Σ|+ min {|p|,log(|Σ|kn)}loglog(|Σ|kn)) query time by cutting the dictionary strings and constructing two compressed tries. Our problem is motivated by haplotype inference from a library of genotypes [13,16]. There, D is a known library of genotypes (|Σ|=2), and p is a haplotype. Indexing all possible haplotypes that can be inferred from D as well as gathering statistical information about them can be used to accelerate various haplotype inference algorithms.