Permu-pattern: discovery of mutable permutation patterns with proximity constraint

Authors:
Meng Hu;Jiong Yang;Wei Su
Affiliations:
Case Western Reserve University, Cleveland, OH, USA;Case Western Reserve University, Cleveland, OH, USA;Case Western Reserve University, Cleveland, OH, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 13
Cited 2

SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining sequential patterns with constraints in large databases

Proceedings of the eleventh international conference on Information and knowledge management
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Finding All Common Intervals of k Permutations

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Scalable sequential pattern mining for biological sequences

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Mining Sequential Patterns from Large Data Sets (The Kluwer International Series on Advances in Database Systems)

Mining Sequential Patterns from Large Data Sets (The Kluwer International Series on Advances in Database Systems)
Parallel mining of closed sequential patterns

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Efficiently Mining Frequent Closed Partial Orders

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
On the similarity of sets of permutations and its applications to genome comparison

COCOON'03 Proceedings of the 9th annual international conference on Computing and combinatorics

Efficient algorithms for mining constrained frequent patterns from uncertain data

Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Efficient algorithms for the mining of constrained frequent patterns from uncertain data

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pattern discovery in sequences is an important problem in many applications, especially in computational biology and text mining. However, due to the noisy nature of data, the traditional sequential pattern model may fail to reflect the underlying characteristics of sequence data in these applications. There are two challenges: First, the mutation noise exists in the data, and therefore symbols may be misrepresented by other symbols; Secondly, the order of symbols in sequences could be permutated. To address the above problems, in this paper we propose a new sequential pattern model called mutable permutation patterns. Since the Apriori property does not hold for our permutation pattern model, a novel Permu-pattern algorithm is devised to mine frequent mutable permutation patterns from sequence databases. A reachability property is identified to prune the candidate set. Last but not least, we apply the permutation pattern model to a real genome dataset to discover gene clusters, which shows the effectiveness of the model. A large amount of synthetic data is also utilized to demonstrate the efficiency of the Permu-pattern algorithm.