Looking for monotonicity properties of a similarity constraint on sequences

Authors:
Ieva Mitasiunaite;Jean-François Boulicaut
Affiliations:
INSA Lyon, LIRIS CNRS UMR, Villeurbanne, France;INSA Lyon, LIRIS CNRS UMR, Villeurbanne, France
Venue:
Proceedings of the 2006 ACM symposium on Applied computing
Year:
2006

Citing 18
Cited 4

Searching for flexible repeated patterns using a non-transitive similarity relation

Pattern Recognition Letters
A database perspective on knowledge discovery

Communications of the ACM
String editing and longest common subsequences

Handbook of formal languages, vol. 2
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A linear space algorithm for computing maximal common subsequences

Communications of the ACM
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining sequential patterns with constraints in large databases

Proceedings of the eleventh international conference on Information and knowledge management
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Mining Frequent Sequential Patterns under a Similarity Constraint

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
A Double Combinatorial Approach to Discovering Patterns in Biological Sequences

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
A perspective on inductive databases

ACM SIGKDD Explorations Newsletter
A Theory of Inductive Query Answering

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Algebra for Inductive Query Evaluation

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An efficient algorithm for mining string databases under constraints

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases

Efficient String Mining under Constraints Via the Deferred Frequency Index

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Introducing Softness into Inductive Queries on String Databases

Proceedings of the 2007 conference on Databases and Information Systems IV: Selected Papers from the Seventh International Baltic Conference DB&IS'2006
Mining uncertain data for frequent itemsets that satisfy aggregate constraints

Proceedings of the 2010 ACM Symposium on Applied Computing
Analysis of time series data with predictive clustering trees

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Constraint-based mining techniques on sequence databases have been studied extensively the last few years and efficient algorithms enable to compute complete collections of patterns (e.g., sequences) which satisfy conjunctions of monotonic and/or anti-monotonic constraints. Studying new applications of these techniques, we believe that a primitive constraint which enforces enough similarity w.r.t a given reference sequence would be extremely useful and should benefit from such a recent algorithmic breakthrough. A non trivial similarity constraint is however neither monotonic nor anti-monotonic. Therefore, we have studied its definition as a conjunction of two constraints which satisfy the desired monotonicity properties: a pattern is called similar to a reference pattern x when its longest common subsequence with x (LCS) is large enough (i.e., a monotonic part) and when the number of deletions such that it becomes the LCS is small enough (i.e., an anti-monotonic part). We provide an experimental validation which confirms the added value of this approach on a biological database. Classical issues like scalability and pruning efficiency are discussed.