Cover similarity based item set mining

Authors:
Marc Segond;Christian Borgelt
Affiliations:
European Centre for Soft Computing, Mieres, Asturias, Spain;European Centre for Soft Computing, Mieres, Asturias, Spain
Venue:
Bisociative Knowledge Discovery
Year:
2012

Citing 11
Cited 2

Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Frequent Closed Itemsets for Association Rules

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
K-Optimal Rule Discovery

Data Mining and Knowledge Discovery
LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Discovering Significant Patterns

Machine Learning

Towards bisociative knowledge discovery

Bisociative Knowledge Discovery
Network creation: overview

Bisociative Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In standard frequent item set mining one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions. We, instead, strive to find item sets for which the similarity of the covers of the items (that is, the sets of transactions containing the items) exceeds a user-defined threshold. This approach yields a much better assessment of the association strength of the items, because it takes additional information about their occurrences into account. Starting from the generalized Jaccard index we extend our approach to a total of twelve specific similarity measures and a generalized form. In addition, standard frequent item set mining turns out to be a special case of this flexible framework. We present an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements. By reporting experiments on several benchmark data sets we demonstrate that the runtime penalty incurred by the more complex (but also more informative) item set assessment is bearable and that the approach yields high quality and more useful item sets.