Searching for flexible repeated patterns using a non-transitive similarity relation
Pattern Recognition Letters
A database perspective on knowledge discovery
Communications of the ACM
String editing and longest common subsequences
Handbook of formal languages, vol. 2
FreeSpan: frequent pattern-projected sequential pattern mining
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A linear space algorithm for computing maximal common subsequences
Communications of the ACM
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
Mining sequential patterns with constraints in large databases
Proceedings of the eleventh international conference on Information and knowledge management
Levelwise Search and Borders of Theories in KnowledgeDiscovery
Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Mining Frequent Sequential Patterns under a Similarity Constraint
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
A Double Combinatorial Approach to Discovering Patterns in Biological Sequences
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
A perspective on inductive databases
ACM SIGKDD Explorations Newsletter
A Theory of Inductive Query Answering
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Algebra for Inductive Query Evaluation
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An efficient algorithm for mining string databases under constraints
KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Efficient String Mining under Constraints Via the Deferred Frequency Index
ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Introducing Softness into Inductive Queries on String Databases
Proceedings of the 2007 conference on Databases and Information Systems IV: Selected Papers from the Seventh International Baltic Conference DB&IS'2006
Mining uncertain data for frequent itemsets that satisfy aggregate constraints
Proceedings of the 2010 ACM Symposium on Applied Computing
Analysis of time series data with predictive clustering trees
KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Hi-index | 0.00 |
Constraint-based mining techniques on sequence databases have been studied extensively the last few years and efficient algorithms enable to compute complete collections of patterns (e.g., sequences) which satisfy conjunctions of monotonic and/or anti-monotonic constraints. Studying new applications of these techniques, we believe that a primitive constraint which enforces enough similarity w.r.t a given reference sequence would be extremely useful and should benefit from such a recent algorithmic breakthrough. A non trivial similarity constraint is however neither monotonic nor anti-monotonic. Therefore, we have studied its definition as a conjunction of two constraints which satisfy the desired monotonicity properties: a pattern is called similar to a reference pattern x when its longest common subsequence with x (LCS) is large enough (i.e., a monotonic part) and when the number of deletions such that it becomes the LCS is small enough (i.e., an anti-monotonic part). We provide an experimental validation which confirms the added value of this approach on a biological database. Classical issues like scalability and pruning efficiency are discussed.