Contrasting Sequence Groups by Emerging Sequences

Authors:
Kang Deng;Osmar R. Zaïane
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8;Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8
Venue:
DS '09 Proceedings of the 12th International Conference on Discovery Science
Year:
2009

Citing 12
Cited 1

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Efficient enumeration of frequent sequences

Proceedings of the seventh international conference on Information and knowledge management
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting change in categorical data: mining contrast sets

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining features for sequence classification

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
On the complexity of finding emerging patterns

Theoretical Computer Science - Pattern discovery in the post genome
Mining minimal distinguishing subsequence patterns with gap constraints

Knowledge and Information Systems
Plant Protein Localization Using Discriminative and Frequent Partition-Based Subsequences

ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Classification of software behaviors for failure detection: a discriminative pattern mining approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Strong Compound-Risk Factors: Efficient Discovery Through Emerging Patterns and Contrast Sets

IEEE Transactions on Information Technology in Biomedicine

An occurrence based approach to mine emerging sequences

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Group comparison per se is a fundamental task in many scientific endeavours but is also the basis of any classifier. Contrast sets and emerging patterns contrast between groups of categorical data. Comparing groups of sequence data is a relevant task in many applications. We define Emerging Sequences (ESs) as subsequences that are frequent in sequences of one group and less frequent in the sequences of another, and thus distinguishing or contrasting sequences of different classes. There are two challenges to distinguish sequence classes: the extraction of ESs is not trivially efficient and only exact matches of sequences are considered. In our work we address those problems by a suffix tree-based framework and a similar matching mechanism. We propose a classifier based on Emerging Sequences. Evaluating against two learning algorithms based on frequent subsequences and exact matching subsequences, the experiments on two datasets show that our model outperforms the baseline approaches by up to 20% in prediction accuracy.