From data mining to knowledge discovery: an overview
Advances in knowledge discovery and data mining
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Substring selectivity estimation
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Small is beautiful: discovering the minimal set of unexpected patterns
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery-Driven Exploration of OLAP Data Cubes
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A Linear-Time Algorithm for Computing Characteristic Strings
ISAAC '94 Proceedings of the 5th International Symposium on Algorithms and Computation
Evaluating Hypothesis-Driven Exception-Rule Discovery with Medical Data Sets
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Characteristic Sets of Strings Common to Semi-structured Documents
DS '99 Proceedings of the Second International Conference on Discovery Science
Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words
ISAAC '98 Proceedings of the 9th International Symposium on Algorithms and Computation
Finding surprising patterns in a time series database in linear time and space
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Unsupervised Spam Detection by Document Complexity Estimation
DS '08 Proceedings of the 11th International Conference on Discovery Science
Pattern Discovery in Bioinformatics: Theory & Algorithms
Pattern Discovery in Bioinformatics: Theory & Algorithms
Hi-index | 0.00 |
We consider mining unusual patterns from text T . Unlike existing methods which assume probabilistic models and use simple estimation methods, we employ a set B of background text in addition to T and composition s w = xy of x and y as patterns. A string w is peculiar if there exist x and y such that w = xy , each of x and y is more frequent in B than in T , and conversely w = xy is more frequent in T . The frequency of xy in T is very small since x and y are infrequent in T , but xy is relatively abundant in T compared to xy in B . Despite these complex conditions for peculiar compositions, we develop a fast algorithm to find peculiar compositions using the suffix tree. Experiments using DNA sequences show scalability of our algorithm due to our pruning techniques and the superiority of the concept of the peculiar composition.