A database perspective on knowledge discovery
Communications of the ACM
String editing and longest common subsequences
Handbook of formal languages, vol. 2
FreeSpan: frequent pattern-projected sequential pattern mining
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A linear space algorithm for computing maximal common subsequences
Communications of the ACM
SPADE: an efficient algorithm for mining frequent sequences
Machine Learning
KDD-Cup 2000 organizers' report: peeling the onion
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining sequential patterns with constraints in large databases
Proceedings of the eleventh international conference on Information and knowledge management
Levelwise Search and Borders of Theories in KnowledgeDiscovery
Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A perspective on inductive databases
ACM SIGKDD Explorations Newsletter
A Theory of Inductive Query Answering
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Algebra for Inductive Query Evaluation
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Looking for monotonicity properties of a similarity constraint on sequences
Proceedings of the 2006 ACM symposium on Applied computing
Constraint-Based Mining and Inductive Databases: European Workshop on Inductive Databases and Constraint Based Mining, Hinterzarten, Germany, March 11-13, ... / Lecture Notes in Artificial Intelligence)
An efficient algorithm for mining string databases under constraints
KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Hi-index | 0.00 |
In many application domains (e.g., WWW mining, molecular biology), large string datasets are available and yet under-exploited. The inductive database framework assumes that both such datasets and the various patterns holding within them might be queryable. In this setting, queries which return patterns are called inductive queries and solving them is one of the core research topics for data mining. Indeed, constraint-based mining techniques on string datasets have been studied extensively. Efficient algorithms enable to compute complete collections of patterns (e.g., substrings) which satisfy conjunctions of monotonic and/or anti-monotonic constraints in large datasets (e.g., conjunctions of minimal and maximal support constraints). We consider that fault-tolerance and softness are extremely important issues for tackling real-life data analysis. We address some of the open problems when evaluating soft-support constraint which implies the computations of pattern soft-occurrences instead of the classical exact matching ones. Solving efficiently soft-support constraints is challenging since it prevents from the clever use of monotonicity properties. We describe our proposal and we provide an experimental validation on real-life clickstream data which confirms the added value of this approach.