Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
The Complexity of Some Problems on Subsequences and Supersequences
Journal of the ACM (JACM)
FreeSpan: frequent pattern-projected sequential pattern mining
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Sequence mining in categorical domains: incorporating constraints
Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Making use of the most expressive jumping emerging patterns for classification
Knowledge and Information Systems
Detecting Group Differences: Mining Contrast Sets
Data Mining and Knowledge Discovery
Scalable Feature Mining for Sequential Data
IEEE Intelligent Systems
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
A practical algorithm to find the best subsequence patterns
Theoretical Computer Science
Sequential PAttern mining using a bitmap representation
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
TSP: Mining Top-K Closed Sequential Patterns
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On detecting differences between groups
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent-subsequence-based prediction of outer membrane proteins
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
BIDE: Efficient Mining of Frequent Closed Sequences
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Constraint-based mining of episode rules and optimal window sizes
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Mining periodic patterns with gap requirement from sequences
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Mining border descriptions of emerging patterns from dataset pairs
Knowledge and Information Systems
Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
The levelwise version space algorithm and its application to molecular fragment finding
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Generalization of pattern-growth methods for sequential pattern mining with gap constraints
MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Data & Knowledge Engineering
Efficient String Mining under Constraints Via the Deferred Frequency Index
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Mining frequent arrangements of temporal intervals
Knowledge and Information Systems
Contrasting Sequence Groups by Emerging Sequences
DS '09 Proceedings of the 12th International Conference on Discovery Science
An unsupervised approach to activity recognition and segmentation based on object-use fingerprints
Data & Knowledge Engineering
Mining weighted sequential patterns in a sequence database with a time-interval weight
Knowledge-Based Systems
An occurrence based approach to mine emerging sequences
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
A brief survey on sequence classification
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Discovering contrasts between collections of data is an important task in data mining. In this paper, we introduce a new type of contrast pattern, called a Minimal Distinguishing Subsequence (MDS). An MDS is a minimal subsequence that occurs frequently in one class of sequences and infrequently in sequences of another class. It is a natural way of representing strong and succinct contrast information between two sequential datasets and can be useful in applicationssuch as protein comparison, document comparison and building sequential classification models. Mining MDS patterns is a challenging task and is significantly different from mining contrasts between relational/transactional data. One particularly important type of constraint that can be integrated into the mining process is the gap constraint. We present an efficient algorithm called ConSGapMiner (Contrast Sequences with Gap Miner), to mine all MDSs satisfying a minimum and maximum gap constraint, plus a maximum length constraint. It employs highly efficient bitset and boolean operations, for powerful gap-based pruning within a prefix growth framework. A performance evaluation with both sparse and dense datasets, demonstrates the scalability of ConSGapMiner and shows its ability to mine patterns from high dimensional datasets at low supports.