SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Mining long sequential patterns in a noisy environment
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining sequential patterns with constraints in large databases
Proceedings of the eleventh international conference on Information and knowledge management
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Finding All Common Intervals of k Permutations
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Scalable sequential pattern mining for biological sequences
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Mining Sequential Patterns from Large Data Sets (The Kluwer International Series on Advances in Database Systems)
Parallel mining of closed sequential patterns
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Efficiently Mining Frequent Closed Partial Orders
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
On the similarity of sets of permutations and its applications to genome comparison
COCOON'03 Proceedings of the 9th annual international conference on Computing and combinatorics
Efficient algorithms for mining constrained frequent patterns from uncertain data
Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Efficient algorithms for the mining of constrained frequent patterns from uncertain data
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Pattern discovery in sequences is an important problem in many applications, especially in computational biology and text mining. However, due to the noisy nature of data, the traditional sequential pattern model may fail to reflect the underlying characteristics of sequence data in these applications. There are two challenges: First, the mutation noise exists in the data, and therefore symbols may be misrepresented by other symbols; Secondly, the order of symbols in sequences could be permutated. To address the above problems, in this paper we propose a new sequential pattern model called mutable permutation patterns. Since the Apriori property does not hold for our permutation pattern model, a novel Permu-pattern algorithm is devised to mine frequent mutable permutation patterns from sequence databases. A reachability property is identified to prune the candidate set. Last but not least, we apply the permutation pattern model to a real genome dataset to discover gene clusters, which shows the effectiveness of the model. A large amount of synthetic data is also utilized to demonstrate the efficiency of the Permu-pattern algorithm.