Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words

Authors:
Hiroki Arimura;Shinichi Shimozono
Affiliations:
-;-
Venue:
ISAAC '98 Proceedings of the 9th International Symposium on Algorithms and Computation
Year:
1998

Citing 12
Cited 8

An algorithm for string matching with a sequence of don't cares

Information Processing Letters
A note on the height of suffix trees

SIAM Journal on Computing
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Toward Efficient Agnostic Learning

Machine Learning - Special issue on computational learning theory, COLT'92
Approximate solution of NP optimization problems

Theoretical Computer Science
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Computing the maximum bichromatic discrepancy, with applications to computer graphics and machine learning

Journal of Computer and System Sciences
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
A Linear-Time Algorithm for Computing Characteristic Strings

ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Approximation algorithms for combinatorial problems

Journal of Computer and System Sciences

Discovering Unordered and Ordered Phrase Association Patterns for Text Mining

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Visualization and Analysis of Web Graphs

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
An Efficient Tool for Discovering Simple Combinatorial Patterns from Large Text Databases

DS '98 Proceedings of the First International Conference on Discovery Science
Characteristic Sets of Strings Common to Semi-structured Documents

DS '99 Proceedings of the Second International Conference on Discovery Science
Extraction Positive and Negative Keywords for Web Communities

DS '00 Proceedings of the Third International Conference on Discovery Science
A Practical Algorithm to Find the Best Subsequence Patterns

DS '00 Proceedings of the Third International Conference on Discovery Science
Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts

DS '01 Proceedings of the 4th International Conference on Discovery Science
Mining Peculiar Compositions of Frequent Substrings from Sparse Text Data Using Background Texts

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I

Quantified Score

Hi-index	0.02

Visualization

Abstract

We study the efficient discovery of word-association patterns, defined by a sequence of strings and a proximity gap, from a collection of texts with binary labels. We present an algorithm that finds all d strings and k proximity word-association patterns that maximizes agreement with the labels. It runs in expected time complexity O(kd-1n logd+1 n) and O(kd-1n) space with the total length n of texts, if texts are uniformly random strings. We also show that the problem to find a best word-association pattern with arbitrarily many strings is MAX SNP-hard.