A parallel algorithm for computing borders

Authors:
Nicolas Hanusse;Sofian Maabout
Affiliations:
CNRS-UMR5800, University of Bordeaux, France;CNRS-UMR5800, University of Bordeaux, France
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 26
Cited 0

Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
New Results on Monotone Dualization and Generating Hypergraph Transversals

SIAM Journal on Computing
Efficient Discovery of Functional Dependencies and Armstrong Relations

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Fast Algorithms for Frequent Itemset Mining Using FP-Trees

IEEE Transactions on Knowledge and Data Engineering
GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets

Data Mining and Knowledge Discovery
A Thorough Experimental Study of Datasets for Frequent Itemsets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Parallel Leap: Large-Scale Maximal Pattern Mining in a Distributed Environment

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Cache-conscious frequent pattern mining on modern and emerging processors

The VLDB Journal — The International Journal on Very Large Data Bases
Optimization of frequent itemset mining on multiple-core processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient mining of maximal frequent itemsets from databases on a cluster of workstations

Knowledge and Information Systems
A view selection algorithm with performance guarantee

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Emerging Cubes: Borders, size estimations and lossless reductions

Information Systems
Standing Out in a Crowd: Selecting Attributes for Maximum Visibility

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
PADS: a simple yet effective pattern-aware dynamic search method for fast maximal frequent pattern mining

Knowledge and Information Systems
Constructing and exploring composite items

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Discovering Conditional Functional Dependencies

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The border concept has been introduced by Mannila and Toivonen in their seminal paper [20]. This concept finds many applications, e.g maximal frequent itemsets, minimal functional dependencies, emerging patterns between consecutive database instances and materialized view selection. For large transactions and relational databases defined on n items or attributes, the running time of any border computations are mainly dominated by the time T (for standard sequential algorithms) required to test the interestingness, in general the frequencies, of sets of candidates. In this paper we propose a general parallel algorithm for computing borders whatever the application is. We prove the efficiency of our algorithm by showing that: (i) it generates exactly the same number of candidates as the standard sequential algorithm and, (ii) if the interestingness test time of a candidate is bounded by Δ then for a multi-processor shared memory machine with p cores, we prove that the total interestingness time Tp