Introducing Softness into Inductive Queries on String Databases

Authors:
Ieva Mitasiunaite;Jean-François Boulicaut
Affiliations:
INSA Lyon, LIRIS CNRS UMR 5205, 69621 Villeurbanne cedex, France, {Ieva.Mitasiunaite,Jean-Francois.Boulicaut}@insa-lyon.fr;INSA Lyon, LIRIS CNRS UMR 5205, 69621 Villeurbanne cedex, France, {Ieva.Mitasiunaite,Jean-Francois.Boulicaut}@insa-lyon.fr
Venue:
Proceedings of the 2007 conference on Databases and Information Systems IV: Selected Papers from the Seventh International Baltic Conference DB&IS'2006
Year:
2007

Citing 18
Cited 0

A database perspective on knowledge discovery

Communications of the ACM
String editing and longest common subsequences

Handbook of formal languages, vol. 2
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A linear space algorithm for computing maximal common subsequences

Communications of the ACM
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining sequential patterns with constraints in large databases

Proceedings of the eleventh international conference on Information and knowledge management
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A perspective on inductive databases

ACM SIGKDD Explorations Newsletter
A Theory of Inductive Query Answering

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Algebra for Inductive Query Evaluation

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Looking for monotonicity properties of a similarity constraint on sequences

Proceedings of the 2006 ACM symposium on Applied computing
Constraint-Based Mining and Inductive Databases: European Workshop on Inductive Databases and Constraint Based Mining, Hinterzarten, Germany, March 11-13, ... / Lecture Notes in Artificial Intelligence)

Constraint-Based Mining and Inductive Databases: European Workshop on Inductive Databases and Constraint Based Mining, Hinterzarten, Germany, March 11-13, ... / Lecture Notes in Artificial Intelligence)
An efficient algorithm for mining string databases under constraints

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many application domains (e.g., WWW mining, molecular biology), large string datasets are available and yet under-exploited. The inductive database framework assumes that both such datasets and the various patterns holding within them might be queryable. In this setting, queries which return patterns are called inductive queries and solving them is one of the core research topics for data mining. Indeed, constraint-based mining techniques on string datasets have been studied extensively. Efficient algorithms enable to compute complete collections of patterns (e.g., substrings) which satisfy conjunctions of monotonic and/or anti-monotonic constraints in large datasets (e.g., conjunctions of minimal and maximal support constraints). We consider that fault-tolerance and softness are extremely important issues for tackling real-life data analysis. We address some of the open problems when evaluating soft-support constraint which implies the computations of pattern soft-occurrences instead of the classical exact matching ones. Solving efficiently soft-support constraints is challenging since it prevents from the clever use of monotonicity properties. We describe our proposal and we provide an experimental validation on real-life clickstream data which confirms the added value of this approach.