An algorithm for string matching with a sequence of don't cares
Information Processing Letters
A note on the height of suffix trees
SIAM Journal on Computing
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Combinatorial pattern discovery for scientific data: some preliminary results
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Toward Efficient Agnostic Learning
Machine Learning - Special issue on computational learning theory, COLT'92
Approximate solution of NP optimization problems
Theoretical Computer Science
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Journal of Computer and System Sciences
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
A Linear-Time Algorithm for Computing Characteristic Strings
ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Approximation algorithms for combinatorial problems
Journal of Computer and System Sciences
Discovering Unordered and Ordered Phrase Association Patterns for Text Mining
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Visualization and Analysis of Web Graphs
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
An Efficient Tool for Discovering Simple Combinatorial Patterns from Large Text Databases
DS '98 Proceedings of the First International Conference on Discovery Science
Characteristic Sets of Strings Common to Semi-structured Documents
DS '99 Proceedings of the Second International Conference on Discovery Science
Extraction Positive and Negative Keywords for Web Communities
DS '00 Proceedings of the Third International Conference on Discovery Science
A Practical Algorithm to Find the Best Subsequence Patterns
DS '00 Proceedings of the Third International Conference on Discovery Science
Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts
DS '01 Proceedings of the 4th International Conference on Discovery Science
Mining Peculiar Compositions of Frequent Substrings from Sparse Text Data Using Background Texts
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Hi-index | 0.02 |
We study the efficient discovery of word-association patterns, defined by a sequence of strings and a proximity gap, from a collection of texts with binary labels. We present an algorithm that finds all d strings and k proximity word-association patterns that maximizes agreement with the labels. It runs in expected time complexity O(kd-1n logd+1 n) and O(kd-1n) space with the total length n of texts, if texts are uniformly random strings. We also show that the problem to find a best word-association pattern with arbitrarily many strings is MAX SNP-hard.