A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets

Authors:
Annalisa Appice;Michelangelo Ceci;Antonio Turi;Donato Malerba
Affiliations:
(Correspd. E-mail: appice@di.uniba.it) Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy;Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy;Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy;Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy
Venue:
Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Year:
2011

Citing 28
Cited 3

Foundations of logic programming; (2nd extended ed.)

Foundations of logic programming; (2nd extended ed.)
Logic programming and databases

Logic programming and databases
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient parallel data mining for association rules

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Bagging predictors

Machine Learning
Communication-efficient distributed mining of association rules

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Information Retrieval

Information Retrieval
Relational Data Mining

Relational Data Mining
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Data Mining the Yeast Genome in a Lazy Functional Language

PADL '03 Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Mining Association Rules in Multiple Relations

ILP '97 Proceedings of the 7th International Workshop on Inductive Logic Programming
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Adaptive and Resource-Aware Mining of Frequent Sets

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Accelerating the Drug Design Process through Parallel Inductive Logic Programming Data Mining

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Query transformations for improving the efficiency of ilp systems

The Journal of Machine Learning Research
Scalability and efficiency in multi-relational data mining

ACM SIGKDD Explorations Newsletter
Inducing Multi-Level Association Rules from Multiple Relations

Machine Learning
Scalable Multi-Relational Association Mining

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A high-performance distributed algorithm for mining association rules

Knowledge and Information Systems
Distributed approximate mining of frequent patterns

Proceedings of the 2005 ACM symposium on Applied computing
Everyware: The Dawning Age of Ubiquitous Computing

Everyware: The Dawning Age of Ubiquitous Computing
Toward knowledge-rich data mining

Data Mining and Knowledge Discovery
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order Terms

ILP '08 Proceedings of the 18th international conference on Inductive Logic Programming
Strategies to parallelize ILP systems

ILP'05 Proceedings of the 15th international conference on Inductive Logic Programming

On enumerating frequent closed patterns with key in multi-relational data

DS'10 Proceedings of the 13th international conference on Discovery science
An adaptive algorithm for finding frequent sets in landmark windows

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Parallelizing the improved algorithm for frequent patterns mining problem

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The amount of data produced by ubiquitous computing applications is quickly growing, due to the pervasive presence of small devices endowed with sensing, computing and communication capabilities. Heterogeneity and strong interdependence, which characterize 'ubiquitous data', require a (multi-)relational approach to their analysis. However, relational data mining algorithms do not scale well and very large data sets are hardly processable. In this paper we propose an extension of a relational algorithm for multi-level frequent pattern discovery, which resorts to data sampling and distributed computation in Grid environments, in order to overcome the computational limits of the original serial algorithm. The set of patterns discovered by the new algorithm approximates the set of exact solutions found by the serial algorithm. The quality of approximation depends on three parameters: the proportion of data in each sample, the minimum support thresholds and the number of samples in which a pattern has to be frequent in order to be considered globally frequent. Considering that the first two parameters are hardly controllable, we focus our investigation on the third one. Theoretically derived conclusions are also experimentally confirmed. Moreover, an additional application in the context of event log mining proves the viability of the proposed approach to relational frequent pattern mining from very large data sets.