An efficient algorithm for sequential random sampling
ACM Transactions on Mathematical Software (TOMS)
Combinatorial pattern discovery for scientific data: some preliminary results
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining asynchronous periodic patterns in time series data
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
FreeSpan: frequent pattern-projected sequential pattern mining
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Mining patterns in long sequential data with noise
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Infominer: mining surprising periodic patterns
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Levelwise Search and Borders of Theories in KnowledgeDiscovery
Data Mining and Knowledge Discovery
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Pattern Discovery in Biosequences
ICGI '98 Proceedings of the 4th International Colloquium on Grammatical Inference
Discovering All Most Specific Sentences by Randomized Algorithms
ICDT '97 Proceedings of the 6th International Conference on Database Theory
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Meta-patterns: Revealing Hidden Periodic Patterns
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Database research at the University of Illinois at Urbana-Champaign
ACM SIGMOD Record
OP-Cluster: Clustering by Tendency in High Dimensional Space
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Introducing Uncertainty into Pattern Discovery in Temporal Event Sequences
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Substructure Clustering on Sequential 3d Object Datasets
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
BIDE: Efficient Mining of Frequent Closed Sequences
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach
IEEE Transactions on Knowledge and Data Engineering
Scalable sequential pattern mining for biological sequences
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification
IEEE Transactions on Knowledge and Data Engineering
Mining Frequent Spatio-Temporal Sequential Patterns
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Sequential Pattern Mining in Multi-Databases via Multiple Alignment
Data Mining and Knowledge Discovery
Exploit sequencing to accelerate hot XML query pattern mining
Proceedings of the 2006 ACM symposium on Applied computing
Efficient mining of group patterns from user movement data
Data & Knowledge Engineering
Discovering Frequent Closed Partial Orders from Strings
IEEE Transactions on Knowledge and Data Engineering
Benchmarking the effectiveness of sequential pattern mining methods
Data & Knowledge Engineering
Mining evolving data streams for frequent patterns
Pattern Recognition
Constraint-based sequential pattern mining: the consideration of recency and compactness
Decision Support Systems
Constraint-based sequential pattern mining: the pattern-growth methods
Journal of Intelligent Information Systems
Extracting interpretable muscle activation patterns with time series knowledge mining
International Journal of Knowledge-based and Intelligent Engineering Systems
Frequent Closed Sequence Mining without Candidate Maintenance
IEEE Transactions on Knowledge and Data Engineering
Analyzing sequential patterns in retail databases
Journal of Computer Science and Technology
A regression-based temporal pattern mining scheme for data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A new framework for detecting weighted sequential patterns in large sequence databases
Knowledge-Based Systems
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Efficient mining of frequent closed XML query pattern
Journal of Computer Science and Technology
Permu-pattern: discovery of mutable permutation patterns with proximity constraint
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Constructing comprehensive summaries of large event sequences
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving the performance of an incremental algorithm driven by error margins
Intelligent Data Analysis - Knowledge Discovery from Data Streams
Smart support functions for sequential pattern mining
Journal of Computational Methods in Sciences and Engineering - Selected papers from the International Conference on Computer Science, Software Engineering, Information Technology, e-Business, and Applications, 2004
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases
Data & Knowledge Engineering
CONTOUR: an efficient algorithm for discovering discriminating subsequences
Data Mining and Knowledge Discovery
Effective database transformation and efficient support computation for mining sequential patterns
Journal of Intelligent Information Systems
Expert Systems with Applications: An International Journal
EventSummarizer: a tool for summarizing large event sequences
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Incremental sequence-based frequent query pattern mining from XML queries
Data Mining and Knowledge Discovery
Efficient frequent sequence mining by a dynamic strategy switching algorithm
The VLDB Journal — The International Journal on Very Large Data Bases
Clustering sequences by overlap
International Journal of Data Mining and Bioinformatics
Mining sequential patterns across multiple sequence databases
Data & Knowledge Engineering
Constructing comprehensive summaries of large event sequences
ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining convergent and divergent sequences in multidimensional data
International Journal of Business Intelligence and Data Mining
Discovering association patterns based on mutual information
MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Mining weighted sequential patterns in a sequence database with a time-interval weight
Knowledge-Based Systems
Efficient discovery of generalized sentinel rules
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
On probabilistic models for uncertain sequential pattern mining
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Mining sequential patterns from probabilistic databases
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Parallel mining of maximal sequential patterns using multiple samples
The Journal of Supercomputing
Incremental algorithm driven by error margins
DS'06 Proceedings of the 9th international conference on Discovery Science
TrajPattern: mining sequential patterns from imprecise trajectories of mobile objects
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Mining compressed sequential patterns
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Efficient Mining of Gap-Constrained Subsequences and Its Various Applications
ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining probabilistically frequent sequential patterns in uncertain databases
Proceedings of the 15th International Conference on Extending Database Technology
Sequential pattern mining -- approaches and algorithms
ACM Computing Surveys (CSUR)
International Journal of Intelligent Information and Database Systems
Mining sequential patterns with extensible knowledge representation
Intelligent Data Analysis
Hi-index | 0.00 |
Pattern discovery in long sequences is of great importance in many applications including computational biology study, consumer behavior analysis, system performance analysis, etc. In a noisy environment, an observed sequence may not accurately reflect the underlying behavior. For example, in a protein sequence, the amino acid N is likely to mutate to D with little impact to the biological function of the protein. It would be desirable if the occurrence of D in the observation can be related to a possible mutation from N in an appropriate manner. Unfortunately, the support measure (i.e., the number of occurrences) of a pattern does not serve this purpose. In this paper, we introduce the concept of compatibility matrix as the means to provide a probabilistic connection from the observation to the underlying true value. A new metric match is also proposed to capture the "real support" of a pattern which would be expected if a noise-free environment is assumed. In addition, in the context we address, a pattern could be very long. The standard pruning technique developed for the market basket problem may not work efficiently. As a result, a novel algorithm that combines statistical sampling and a new technique (namely border collapsing) is devised to discover long patterns in a minimal number of scans of the sequence database with sufficiently high confidence. Empirical results demonstrate the robustness of the match model (with respect to the noise) and the efficiency of the probabilistic algorithm.