A note on the height of suffix trees
SIAM Journal on Computing
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Toward Efficient Agnostic Learning
Machine Learning - Special issue on computational learning theory, COLT'92
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bounds on the Complexity of the Longest Common Subsequence Problem
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Mining in the Phrasal Frontier
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
A Linear-Time Algorithm for Computing Characteristic Strings
ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words
ISAAC '98 Proceedings of the 9th International Symposium on Algorithms and Computation
Visualization and Analysis of Web Graphs
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Extraction Positive and Negative Keywords for Web Communities
DS '00 Proceedings of the Third International Conference on Discovery Science
Mining Peculiar Compositions of Frequent Substrings from Sparse Text Data Using Background Texts
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Hi-index | 0.00 |
We consider Maximum Agreement Problem which is, given positive and negative documents, to find a characteristic set that matches many of positive documents but rejects many of negative ones. A characteristic set is a sequence (x1,...., xd) of strings such that each xi is a suffix of xi+1 and all xi's appear in a document without overlaps. A characteristic set matches semi-structured documents with primitives or user's defined macros. For example, ("set", "characteristic set", "〈/title〉 characteristic set") is a characteristic set extracted from an HTML file. But, an algorithm that solves Maximum Agreement Problem does not output useless characteristic sets, such as those made of only tags of HTML, since such characteristic sets may match most of positive documents but also match most of negative ones. We present an algorithm that, given an integer d which is the number of strings in a characteristic set, solves Maximum Agreement Problem in O(n2hd) time, where n is the total length of documents and h is the height of the suffix tree of the documents.