Theoretical Computer Science
Theoretical Computer Science
Text algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Polynomial-time learning of elementary formal systems
New Generation Computing
Space-Economical Construction of Index Structures for All Suffixes of a String
MFCS '02 Proceedings of the 27th International Symposium on Mathematical Foundations of Computer Science
Finding Best Patterns Practically
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
The Minimum DAWG for All Suffixes of a String and Its Applications
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
A Practical Algorithm to Find the Best Subsequence Patterns
DS '00 Proceedings of the Third International Conference on Discovery Science
A Practical Algorithm to Find the Best Episode Patterns
DS '01 Proceedings of the 4th International Conference on Discovery Science
Efficient tree pattern matching
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Algorithms for String Pattern Discovery
MDAI '07 Proceedings of the 4th international conference on Modeling Decisions for Artificial Intelligence
String Kernels Based on Variable-Length-Don't-Care Patterns
DS '08 Proceedings of the 11th International Conference on Discovery Science
A faster algorithm for matching a set of patterns with variable length don't cares
Information Processing Letters
A mining technique using n-grams and motion transcripts for body sensor network data repository
WH '10 Wireless Health 2010
Sparse substring pattern set discovery using linear programming boosting
DS'10 Proceedings of the 13th international conference on Discovery science
Journal of Discrete Algorithms
Proceedings of the 2nd Conference on Wireless Health
A new family of string classifiers based on local relatedness
DS'06 Proceedings of the 9th international conference on Discovery Science
Practical algorithms for pattern based linear regression
DS'05 Proceedings of the 8th international conference on Discovery Science
Composite pattern discovery for PCR application
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
PMBC: Pattern mining from biological sequences with wildcard constraints
Computers in Biology and Medicine
Hi-index | 0.00 |
A variable-length-don't-care pattern (VLDC pattern) is an element of set 驴 = (驴驴{*})*, where 驴 is an alphabet and * is a wildcard matching any string in 驴*. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair 驴q, k驴, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.