Optimization of a language for data mining

Authors:
Rosa Meo
Affiliations:
Università degli Studi di Torino, corso Svizzera 185 - 10149 - Torino - Italy
Venue:
Proceedings of the 2003 ACM symposium on Applied computing
Year:
2003

Citing 13
Cited 7

A database perspective on knowledge discovery

Communications of the ACM
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of constrained frequent set queries with 2-variable constraints

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Can we push more constraints into frequent pattern mining?

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
An Extension to SQL for Mining Association Rules

Data Mining and Knowledge Discovery
On the Complexity of Mining Quantitative Association Rules

Data Mining and Knowledge Discovery
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Mining Frequent Item Sets with Convertible Constraints

Proceedings of the 17th International Conference on Data Engineering
Using Condensed Representations for Interactive Association Rule Mining

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
A New SQL-like Operator for Mining Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Constraint-Based Discovery and Inductive Queries: Application to Association Rule Mining

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Optimization of association rule mining queries

Intelligent Data Analysis

Answering constraint-based mining queries on itemsets using previous materialized results

Journal of Intelligent Information Systems
Efficient online mining of large databases

International Journal of Business Information Systems
Three strategies for concurrent processing of frequent itemset queries using FP-growth

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Inductive databases and constraint-based data mining

ICFCA'11 Proceedings of the 9th international conference on Formal concept analysis
A greedy approach to concurrent processing of frequent itemset queries

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Partition-Based approach to processing batches of frequent itemset queries

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
A Framework for Synthesizing Arbitrary Boolean Queries Induced by Frequent Itemsets

International Journal of Knowledge-Based Organizations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Constraint-based mining has attracted in recent years the interest of the data mining research community because it increases the relevance of the result set, reduces its volume and the amount of workload. However, constrained-based mining will be completely feasible only when efficient optimizers for mining languages will be available.This paper is a first step towards the construction of optimizers for a constraint-based mining language. It provides the guidelines for the comparison of classes of statements by means of the relationships existing between their result sets. Furthermore it identifies as useful information to the optimization the presence of unique constraints and functional dependencies in the schema of the database. We show the practical implications of the discussed principles with a set of algorithms designed for a specific mining language. These algorithms use also a new designed index, called mining index that allows to reduce the portion of the database to be read in response to some classes of queries. In these cases the workload of the mining engine is greatly reduced or completely avoided in a significant subset of the cases.