Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Efficient enumeration of frequent sequences
Proceedings of the seventh international conference on Information and knowledge management
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting change in categorical data: mining contrast sets
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining features for sequence classification
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
On the complexity of finding emerging patterns
Theoretical Computer Science - Pattern discovery in the post genome
Mining minimal distinguishing subsequence patterns with gap constraints
Knowledge and Information Systems
Plant Protein Localization Using Discriminative and Frequent Partition-Based Subsequences
ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Classification of software behaviors for failure detection: a discriminative pattern mining approach
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Strong Compound-Risk Factors: Efficient Discovery Through Emerging Patterns and Contrast Sets
IEEE Transactions on Information Technology in Biomedicine
An occurrence based approach to mine emerging sequences
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Hi-index | 0.00 |
Group comparison per se is a fundamental task in many scientific endeavours but is also the basis of any classifier. Contrast sets and emerging patterns contrast between groups of categorical data. Comparing groups of sequence data is a relevant task in many applications. We define Emerging Sequences (ESs) as subsequences that are frequent in sequences of one group and less frequent in the sequences of another, and thus distinguishing or contrasting sequences of different classes. There are two challenges to distinguish sequence classes: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered. In our work we address those problems by a suffix tree-based framework and a similar matching mechanism. We propose a classifier based on Emerging Sequences. Evaluating against two learning algorithms based on frequent subsequences and exact matching subsequences, the experiments on two datasets show that our model outperforms the baseline approaches by up to 20% in prediction accuracy.