Discovering Unordered and Ordered Phrase Association Patterns for Text Mining

Authors:
Ryoichi Fujino;Hiroki Arimura;Setsuo Arikawa
Affiliations:
-;-;-
Venue:
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Year:
2000

Citing 14
Cited 7

The use of phrases and structured queries in information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
A note on the height of suffix trees

SIAM Journal on Computing
Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Toward Efficient Agnostic Learning

Machine Learning - Special issue on computational learning theory, COLT'92
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Efficient string matching: an aid to bibliographic search

Communications of the ACM
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
On Classification and Regression

DS '98 Proceedings of the First International Conference on Discovery Science
Color Set Size Problem with Application to String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words

ISAAC '98 Proceedings of the 9th International Symposium on Algorithms and Computation

Optimized Substructure Discovery for Semi-structured Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Extracting Characteristic Structures among Words in Semistructured Documents

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Efficient Data Mining from Large Text Databases

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Practical Algorithm to Find the Best Subsequence Patterns

DS '00 Proceedings of the Third International Conference on Discovery Science
A Practical Algorithm to Find the Best Episode Patterns

DS '01 Proceedings of the 4th International Conference on Discovery Science
Location-specific tweet detection and topic summarization in Twitter

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper considers the problem of finding all frequent phrase association patterns in a large collection of unstructured texts, where a phrase association pattern is a set of consecutive sequences of arbitrary number of keywords which appear together in a document. For the ordered and the unordered versions of phrase association patterns, we present efficient algorithms, called Levelwise-Scan, based on the sequential counting technique of Apriori algorithm. To cope with the problem of the huge feature space of phrase association patterns, the algorithm uses the generalized suffix tree and the pattern matching automaton. By theoretical and empirical analyses, we show that the algorithms runs quickly on most random texts for a wide range of parameter values and scales up for large disk-resident text databases.