The use of phrases and structured queries in information retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
A note on the height of suffix trees
SIAM Journal on Computing
Combinatorial pattern discovery for scientific data: some preliminary results
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Toward Efficient Agnostic Learning
Machine Learning - Special issue on computational learning theory, COLT'92
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Fast discovery of association rules
Advances in knowledge discovery and data mining
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Efficient string matching: an aid to bibliographic search
Communications of the ACM
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
On Classification and Regression
DS '98 Proceedings of the First International Conference on Discovery Science
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words
ISAAC '98 Proceedings of the 9th International Symposium on Algorithms and Computation
Optimized Substructure Discovery for Semi-structured Data
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Extracting Characteristic Structures among Words in Semistructured Documents
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Efficient Data Mining from Large Text Databases
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Practical Algorithm to Find the Best Subsequence Patterns
DS '00 Proceedings of the Third International Conference on Discovery Science
A Practical Algorithm to Find the Best Episode Patterns
DS '01 Proceedings of the 4th International Conference on Discovery Science
Location-specific tweet detection and topic summarization in Twitter
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
This paper considers the problem of finding all frequent phrase association patterns in a large collection of unstructured texts, where a phrase association pattern is a set of consecutive sequences of arbitrary number of keywords which appear together in a document. For the ordered and the unordered versions of phrase association patterns, we present efficient algorithms, called Levelwise-Scan, based on the sequential counting technique of Apriori algorithm. To cope with the problem of the huge feature space of phrase association patterns, the algorithm uses the generalized suffix tree and the pattern matching automaton. By theoretical and empirical analyses, we show that the algorithms runs quickly on most random texts for a wide range of parameter values and scales up for large disk-resident text databases.