The input/output complexity of sorting and related problems
Communications of the ACM
Usefulness of the Karp-Miller-Rosenberg algorithm in parallel computations on strings and arrays
Theoretical Computer Science
On the complexity of dualization of monotone disjunctive normal forms
Journal of Algorithms
Theoretical Computer Science
Data mining, hypergraph transversals, and machine learning (extended abstract)
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A data structure for manipulating priority queues
Communications of the ACM
Cache Oblivious Distribution Sweeping
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Better Filtering with Gapped q-Grams
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Discovering all most specific sentences
ACM Transactions on Database Systems (TODS)
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Rapid identification of repeated patterns in strings, trees and arrays
STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Proximity Mergesort: optimal in-place sorting in the cache-oblivious model
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
On spaced seeds for similarity search
Discrete Applied Mathematics
Bases of Motifs for Generating Repeated Patterns with Wild Cards
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Good spaced seeds for homology search
Bioinformatics
Algorithms and analyses for maximal vector computation
The VLDB Journal — The International Journal on Very Large Data Bases
Designing patterns for profile HMM search
Bioinformatics
Discrete Applied Mathematics - Special issue: Discrete algorithms and optimization, in honor of professor Toshihide Ibaraki at his retirement from Kyoto University
Computational aspects of monotone dualization: A brief survey
Discrete Applied Mathematics
Mining Biological Sequences with Masks
DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Fast computation of good multiple spaced seeds
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Structural analysis of gapped motifs of a string
MFCS'07 Proceedings of the 32nd international conference on Mathematical Foundations of Computer Science
seed-based exclusion method for non-coding RNA gene search
COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics
Hi-index | 5.23 |
We introduce a new notion of motifs, called masks, that succinctly represents the repeated patterns for an input sequence T of n symbols drawn from an alphabet @S. We show how to build the set of all frequent maximal masks of length L in O(2^Ln) time and space in the worst case, using the Karp-Miller-Rosenberg approach. We analytically show that our algorithm performs better than the method based on constant-time enumerating and checking all the potential (|@S|+1)^L candidate patterns in T, after a polynomial-time preprocessing of T. Our algorithm is also cache-friendly, attaining O(2^Lsort(n)) block transfers, where sort(n) is the cache complexity of sorting n items.